What does “Stage Skipped” mean in Apache Spark web UI? - Big Data In Real World

What does “Stage Skipped” mean in Apache Spark web UI?

How to export a Hive table into a CSV file?
August 6, 2021
How to download an entire bucket from S3?
August 11, 2021
How to export a Hive table into a CSV file?
August 6, 2021
How to download an entire bucket from S3?
August 11, 2021

Sometimes you might see a stage being skipped in the DAG visualization in Spark web UI. In this post we are going to discuss couple of reasons how a stage might be skipped during execution of a job in Spark.

spark skipped stages

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Cached data

If the data is cached or persisted by an explicit use of cache() or persist() you might see a stage being skipped when the result of the stage is already cached.

Shuffle data

Spark will automatically cache the data in the stage right after the shuffle. Shuffle is an expensive operation and hence Spark does this automatically. But note, the data will not be available for ever. This data will be evicted using Least Recently Used (LRU) strategy as soon as memory becomes unavailable for newer data.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What does “Stage Skipped” mean in Apache Spark web UI?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X