HDFS Federation - Big Data In Real World

HDFS Federation

Reading A File From HDFS – Java Program
August 26, 2015
Can Reducer always be reused for Combiner?
September 1, 2015
Reading A File From HDFS – Java Program
August 26, 2015
Can Reducer always be reused for Combiner?
September 1, 2015

What is HDFS Federation?

Namenode is responsible for the successful operation of HDFS.  Namenode holds the entire metadata of HDFS, which includes information about files and directories like who created the file/directory, when it was created, when it was last modified, permissions etc. and more importantly Namnode holds the locations of the blocks that make up each file in HDFS in its memory.

All clients has to go through Namenode to perform any READ and WRITE operation in HDFS. Since Namenode has the entire metadata of HDFS in a big cluster Namenode can become huge in volume its memory becomes a limiting factor and will start to slow down. Hence Namenode can become the bottleneck and could cause performance issues.

To tackle this issue, in version 0.23 HDFS Federation was introduced and with that you can add multiple Namenodes to a cluster. Each Namenode is responsible to manage a portion of the filesystem there by sharing the workload of the cluster.

For instance, let’s say we have 2 teams – Marketing and Research in our company funding the Hadoop cluster. You can create a Namespace called /marketing which will be managed by one Namenode and another Namespace under /research which will be managed by another Namenode.

The advantage of this is that you don’t have to run two different Hadoop clusters. You are able to run a single Hadoop infrastructure but one Namnode will manage all the files under /marketing and another namenode will manage all the files under /research.

Will both Namenodes share information?

No. Each Namenode is only responsible for it’s assigned namespace and will not share metadata or information between them and also will not communicate with one another. When a Namenode managing /marketing goes down it will affect all the files under /marketing and users will still be able to access HDFS and files under /research since Namenode managing /research is full functional.

Can a cluster have more than one Namenode?

The simple answer to the question is yes but it needs be supported with the explanation of HDFS Federation.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

HDFS Federation
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X