What is the difference between NameNode and Secondary NameNode? - Big Data In Real World

What is the difference between NameNode and Secondary NameNode?

Can multiple Kafka consumers read the same message from a partition?
May 26, 2021
How to save Spark DataFrame directly to a Hive table?
May 31, 2021
Can multiple Kafka consumers read the same message from a partition?
May 26, 2021
How to save Spark DataFrame directly to a Hive table?
May 31, 2021

NameNode

NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on a file named FSIMAGE. During the start up of NameNode, the FSIMAGE file will be read and loaded into memory. 

Any ongoing changes to the files, directories in FSIMAGE will be written to memory and to a temporary log file. NameNode does not save the ongoing changes to FSIMAGE directly and this is because FSIMAGE file could be big for a big HDFS and updating a big file at runtime will be quite expensive and slow.

Secondary NameNode

Secondary NameNode keeps a copy of FSIMAGE. Periodically Secondary NameNode will get the copy of the FSIMAGE file and the temporary log file from the NameNode and apply the log file to the FSIMAGE file. There by bringing the FSIMAGE file current.

This relieves the NameNode from worrying about merging the contents of FSIMAGE with the temporary log file. Secondary NameNode however doesn’t take over the functions of the NameNode if the NameNode encounters an issue. Secondary NameNode can be manually made the primary NameNode but it doesn’t happen automatically.

Secondary NameNode is also an old concept. Newer versions of Hadoop support High Availability capabilities with Quorum Journal Manager (QJM) or NFS (shared storage).

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between NameNode and Secondary NameNode?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X