How to properly add jars to a Spark application? - Big Data In Real World

How to properly add jars to a Spark application?

How to convert a String to Integer in Hive?
January 4, 2021
How does Broadcast Nested Loop Join work in Spark?
January 8, 2021
How to convert a String to Integer in Hive?
January 4, 2021
How does Broadcast Nested Loop Join work in Spark?
January 8, 2021

There are so many properties in Spark that affect the way you can add jars to a Spark application. We understand it could be confusing and this post is aimed at giving you clarity on different options and when to use which option.

Adding jars to your application

Use –jars  or SparkContext.addJar  to add jar to your application. Note that this option will make the jars available on the nodes in the cluster but the jars will not be added to the classpath. You would have to explicitly add them (see below).

 –jars vs SparkContext.addJar

–jars  is used with Spark submit and SparkContext.addJar  is used in the code. You can use either one as both gets you the same result. Not that the properties that you set in code would overwrite the corresponding property values that you set via Spark submit.

Adding jars to the classpath of your application

To make the jars available for your application you first need to push the jars to the nodes running the application. This is achieved by the –jars or SparkContext.addJar .

Next step is to add the jars to the classpath of driver or worker nodes or both based on the need.

Adding jars to the driver’s classpath 

If you need a jar only on the node assigned as the driver for your application then you need to use –conf spark.driver.extraClassPath  or –driver-class-path . Both properties yield you the same result.

Adding jars to the executor’s classpath

I you want the jars to be added to the classpath of all the worker nodes or executors running your application then you need to use -conf spark.executor.extraClassPath 

Conclusion

In conclusion, if you want to make myjar.jar available to your application in both driver and executor nodes you need to add the jar first to nodes and add it to both driver’s and executor’s classpath.

spark-submit --jars myjar.jar \ 
--driver-class-path myjar.jar \ 
--conf spark.executor.extraClassPath=myjar.jar \ 
--class SampleApplication my-application.jar

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to properly add jars to a Spark application?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X