How to check size of a directory in HDFS? - Big Data In Real World

How to check size of a directory in HDFS?

How does Broadcast Hash Join work in Spark?
January 15, 2021
What is the difference between Hive internal tables and external tables?
January 20, 2021
How does Broadcast Hash Join work in Spark?
January 15, 2021
What is the difference between Hive internal tables and external tables?
January 20, 2021

This is a very common need in the day to day Big Data world with a simple solution.

Solution

Use the hdfs du command to get the size of a directory in HDFS.

hdfs -du -s -h /path/to/dir

– du stands for disk usage

-s stands for summary to aggregate the size of files

-h stands for human readable (for e.g 64.0m instead of 67108864)

-v to display column names as header in the output

-x to exclude snapshots from the result. Snapshots are read only, point in time copies of a folder structure in HDFS. Usually used by Hadoop admins to preserve a copy of the files and folders at a point in time.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to check size of a directory in HDFS?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X