Reading A File From HDFS - Java Program - Big Data In Real World

Reading A File From HDFS – Java Program

Writing A File To HDFS – Java Program
August 23, 2015
HDFS Federation
August 30, 2015
Writing A File To HDFS – Java Program
August 23, 2015
HDFS Federation
August 30, 2015

Reading A File From HDFS – Java Program

In this last post we saw how to write a file to HDFS by writing our own Java program. In this post we will see how to read a file from HDFS by writing a Java program.

Here is the program – FileReadFromHDFS.java

public class FileReadFromHDFS {

public static void main(String[] args) throws Exception {

//File to read in HDFS
String uri = args[0];

Configuration conf = new Configuration();

//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;

try {
//Open the path mentioned in HDFS
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);

System.out.println("End Of file: HDFS file read complete");

} finally {
IOUtils.closeStream(in);
}
}
}

This program will take in an argument which is nothing but the fully qualified HDFS path to a file which we would read and display the contents of the file on the screen. This program will simulate the hadoop fs -cat  command.

//File to read in HDFS
String uri = args[0];

We need to know is few key information about the cluster, like the name node details etc. The details are already specified in the configuration files during cluster setup.

Configuration conf = new Configuration();

The easiest way to get the configuration of the cluster is by instantiating the Configuration object and this will read the configuration files from the classpath and read and load all the information that is needed by the program.

//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;

In the next line we will get the FileSystem  object using the URL that we passed as the program input and the configuration that we just created. This will return the DistributedFileSystem  object and  once we have the file system object the next thing we need is the input stream to the file that we would like to read.

in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);

We can get the input stream by calling  the open method on the file system object by supplying the HDFS URL of the file we would like to read. Then we will use copyBytes  method from the Hadoop’s IOUtils  class to read the entire file’s contents from the input stream and print it on the screen.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Reading A File From HDFS – Java Program
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X