How to convert List to a JavaRDD in Spark? - Big Data In Real World

How to convert List to a JavaRDD in Spark?

What is the difference between repartition and coalesce in Spark?
December 21, 2020
What is the difference between hadoop fs, hadoop dfs and hdfs dfs commands?
December 25, 2020
What is the difference between repartition and coalesce in Spark?
December 21, 2020
What is the difference between hadoop fs, hadoop dfs and hdfs dfs commands?
December 25, 2020

This is a very common use case, if you are working on a Spark project and writing code in Java. You have a list and now you want to convert the list into a JavaRDD.

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Solution

List<String> l = new ArrayList<>();

l.add(“Red”);
l.add(“Green”);
l.add(“Blue”);

Simply use the JavaSparkContext’s parallelize method and it returns JavaRDD.

JavaSparkContext jsc = new JavaSparkContext();

JavaRDD<String> rdd = jsc.parallelize(l);

JavaPairRDD

Sometimes it is convenient to work with JavaPairRDD and here is how you can create a JavaPairRDD.

First you create a list of Tuples. In the below example you have a tuple with an Integer and String.

List<Tuple2<Integer, String>> pair = new ArrayList<>(); 

pair.add(new Tuple2<>(0, “Red”)); 
pair.add(new Tuple2<>(1, “Blue”));

Once you have the list, use the parallelizePairs on JavaSparkContext to create the JavaPairRDD.

JavaSparkContext jsc = new JavaSparkContext(); 

JavaPairRDD<Integer, String> rdd = jsc.parallelizePairs(pair);

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to convert List to a JavaRDD in Spark?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X