What is the difference between Apache Pig and Hive? - Big Data In Real World

What is the difference between Apache Pig and Hive?

How to find duplicate elements or rows in a Spark DataFrame?
May 19, 2021
What are broadcast variables in Spark and when to use them?
May 24, 2021
How to find duplicate elements or rows in a Spark DataFrame?
May 19, 2021
What are broadcast variables in Spark and when to use them?
May 24, 2021
  1. Apache Pig was created by Yahoo. Apache Hive was created by Facebook. Both tools aimed at hiding the complexities of writing MapReduce jobs.
  2. Pig is similar to procedural language. Hive is closer to declarative SQL
  3. Both tools take in instructions or SQL and converts them to MapReduce jobs behind the scenes
  4. Hive organizes the data as table and partitions and this metadata can be persisted in Hive’s metastore. Pig doesn’t offer any way to persist metadata.
  5. Hive’s architecture supports JDBC/ODBC clients, authentication, authorization, auditing and logging. Pig’s architecture is not that mature to offer the capabilities offered by Hive.
  6. Hive is widely integrated with other tools in the Big Data ecosystem like HBase, Spark etc.

Should I choose Pig or Hive if I am starting a Big Data project?

Choose Hive.

Pig is not actively developed anymore. Last release was from June 2017

Hive is widely adopted in the big data space and provides great integration capabilities with other popular tools like Spark.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between Apache Pig and Hive?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X