What is the difference between NameNode and Secondary NameNode?

Can multiple Kafka consumers read the same message from a partition?

May 26, 2021

How to save Spark DataFrame directly to a Hive table?

May 31, 2021

Published by Big Data In Real World at May 28, 2021

NameNode

NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on a file named FSIMAGE. During the start up of NameNode, the FSIMAGE file will be read and loaded into memory.

Any ongoing changes to the files, directories in FSIMAGE will be written to memory and to a temporary log file. NameNode does not save the ongoing changes to FSIMAGE directly and this is because FSIMAGE file could be big for a big HDFS and updating a big file at runtime will be quite expensive and slow.

Secondary NameNode

Secondary NameNode keeps a copy of FSIMAGE. Periodically Secondary NameNode will get the copy of the FSIMAGE file and the temporary log file from the NameNode and apply the log file to the FSIMAGE file. There by bringing the FSIMAGE file current.

This relieves the NameNode from worrying about merging the contents of FSIMAGE with the temporary log file. Secondary NameNode however doesn’t take over the functions of the NameNode if the NameNode encounters an issue. Secondary NameNode can be manually made the primary NameNode but it doesn’t happen automatically.

Secondary NameNode is also an old concept. Newer versions of Hadoop support High Availability capabilities with Quorum Journal Manager (QJM) or NFS (shared storage).

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between NameNode and Secondary NameNode?

Can multiple Kafka consumers read the same message from a partition?

How to save Spark DataFrame directly to a Hive table?

Can multiple Kafka consumers read the same message from a partition?

How to save Spark DataFrame directly to a Hive table?

NameNode

Secondary NameNode

Big Data In Real World

Related posts

How to view the contents of a GZiped file in HDFS?

How to find out if a directory in HDFS is empty or not?

How to get a few lines of data from a file in HDFS?