Big Data Interview Questions and Answers (Part 2)

Real World End To End Project using Spark, Elasticsearch, Kibana, REST and Angular

March 8, 2019

Improving Performance In Spark Using Partitions

March 25, 2019

Published by Big Data In Real World at March 11, 2019

Spark stole the webinar in 2018

We did this webinar in Nov 2018 and our first webinar on interview question was on Nov 2017. Back in 2017 our community sent us a lot of Hadoop related questions to answer. In 2018, the focus was more on Spark.

We quite often hosts webinars like these, sign up below to get invitations to join one of our webinars.

List of Big Data interview questions that we answered in the webinar

How do you handle scenarios when Spark runs out of memory? (12:40)

How Spark performs operations and generate results when dataset doesn’t fit in memory? (12:40)

What do you do when one of your Spark jobs fails with OOM error? (12:40)

How do you handle slow running jobs in Spark? (28:40)

What do you do when one task takes lot of time in your Spark job while other completed in time? (28:40)

Tell us some of the Spark optimization techniques you used in your current project. (28:40)

How do you handle Spark streaming failures? (40:30)

What happens to Spark streaming when there is network failure during processing? (40:30)

How do you recover from Spark streaming failures? (40:33)

What is the difference between DataFrame and Dataset? (47:10)

When do you use DataFrame and when do you use Dataset? (47:10)

How do you properly remove Datanodes from your cluster? (52:30)

How do you secure Hive? (52:30)

How do you authorize users in Hive? (52:30)

Provide an use case for Zookeeper (1:00:00)

What is the role of Zookeeper in a Big Data cluster (1:00:00)

How do you limit the number of files created under a directory in HDFS? (1:10:40)

How do you limit the space allocation in HDFS? (1:10:40)

Live questions from webinar attendees (1:15:30)

Full webinar

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Big Data Interview Questions and Answers (Part 2)

Real World End To End Project using Spark, Elasticsearch, Kibana, REST and Angular

Improving Performance In Spark Using Partitions

Real World End To End Project using Spark, Elasticsearch, Kibana, REST and Angular

Improving Performance In Spark Using Partitions

Spark stole the webinar in 2018

List of Big Data interview questions that we answered in the webinar

Full webinar

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!