What are applications, jobs, stages and tasks in Spark? - Big Data In Real World

What are applications, jobs, stages and tasks in Spark?

How to get the current date and time in Hive?
September 27, 2021
How to replace characters in Hive?
October 1, 2021
How to get the current date and time in Hive?
September 27, 2021
How to replace characters in Hive?
October 1, 2021

We get a lot of questions on the differences in Spark applications, jobs, stages and tasks. Also we see there is a lot of misunderstanding about these topics with new learners and experienced Spark developers alike. So our goal with this post is to give you a crisp pointer on each of these concepts in Spark.

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Task

Task is the smallest execution unit in Spark. A task in spark executes a series of instructions. For eg. reading data, filtering and applying map() on data can be combined into a task. Tasks are executed inside an executor.

Stage

A stage comprises several tasks and every task in the stage executes the same set of instructions.

Job

A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.

Application

An application comprises several jobs. A job is created, whenever you execute an action function like write().

Summary

A Spark application can have many jobs. A job can have many stages. A stage can have many tasks. A task executes a series of instructions.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What are applications, jobs, stages and tasks in Spark?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X