NameNode and DataNode

In this post let’s talk about the 2 important types of nodes and it’s functions in your Hadoop cluster – NameNode and DataNode.

What is HDFS?

We covered a great deal of information about HDFS in “HDFS – Why Another Filesystem?” chapter in the Hadoop Starter Kit course. If you are new to Hadoop, we suggest to take the free course.

Namenode

NameNode is the centerpiece of HDFS.
NameNode is also known as the Master
NameNode only stores the metadata of HDFS – the directory tree of all files in the file system, and tracks the files across the cluster.
NameNode does not store the actual data or the dataset. The data itself is actually stored in the DataNodes.
NameNode knows the list of the blocks and its location for any given file in HDFS. With this information NameNode knows how to construct the file from blocks.
NameNode is so critical to HDFS and when the NameNode is down, HDFS/Hadoop cluster is inaccessible and considered down.
NameNode is a single point of failure in Hadoop cluster.
NameNode is usually configured with a lot of memory (RAM). Because the block locations are help in main memory.

DataNode

DataNode is responsible for storing the actual data in HDFS.
DataNode is also known as the Slave
NameNode and DataNode are in constant communication.
When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for.
When a DataNode is down, it does not affect the availability of data or the cluster. NameNode will arrange for replication for the blocks managed by the DataNode that is not available.
DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode.

Hardware Configuration

Hardware configuration of nodes varies from cluster to cluster and it depends on the usage of the cluster. In Some Hadoop clusters the velocity of data growth is high, in that instance more importance is given to the storage capacity. If the SLAs for the job executions are important and can not be missed then more importance is give to the processing power of nodes.

Often the term “Commodity Computers” is misunderstood. Commodity Computers or Nodes does not mean cheap or less powerful hardware, it just means in-expensive computer and deemphasize the need for specialized hardware.

Here is a sample configuration for NameNode and DataNode hardware configuration.

Name Node Configuration

Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 128 GB
Disk: 6 x 1TB SATA
Network: 10 Gigabit Ethernet

Data Node Configuration

Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 64 GB
Disk: 12-24 x 1TB SATA
Network: 10 Gigabit Ethernet

Like what you are reading? Enroll in our free Hadoop Starter Kit course & explore Hadoop in depth.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

5 Comments

Hadoop Modes - Hadoop In Real World says:

July 19, 2015 at 7:26 am

[…] 1. NameNode 2. DataNode 3. JobTracker 4. TaskTracker 5. ResourceManager (MRv2) 6. ApplicationMaster (MRv2) 7. NodeManager (MRv2) 8. SecondaryNameNode etc.. […]
Which Big Data Framework Will Elevate Your Business? – Bitfirm.co says:

September 23, 2019 at 3:35 pm

[…] a scenario, if one stores a vast number of small files, there’s higher chances of overloading of NameNode that stores the namespace of HDFS, which is practically not a good […]
Spark vs Hadoop: Which Big Data Framework Will Elevate Your Business? says:

March 25, 2020 at 7:03 am

[…] a scenario, if one stores a vast number of small files, there’s higher chances of overloading of NameNode that stores the namespace of HDFS, which is practically not a good […]
How Does Hadoop Process Unstructured Data? - Business & Marketing | SecondIncomeWays says:

April 28, 2022 at 10:58 am

[…] computer. Different operating systems may use a different distributed file system. To use HDFS, a NameNode must first partition the data into smaller data blocks, or […]
Hadoop Namenode? The 8 Correct Answer - Ko.taphoamini.com says:

June 4, 2022 at 2:06 pm

[…] + Read More Here […]

NameNode and DataNode

How to change default replication factor?

JobTracker and TaskTracker

How to change default replication factor?

JobTracker and TaskTracker

NameNode and DataNode

What is HDFS?

Namenode

DataNode

Hardware Configuration

Name Node Configuration

Data Node Configuration

Big Data In Real World

Related posts

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Hadoop In Real World is changing to Big Data In Real World

5 Comments