Contributing limited/specific amount of storage as a slave to the Hadoop Cluster!!

Sumayya Khatoon
4 min readJul 8, 2021

Hello Folks! In this Blog Post I’ll show you how you can contribute limited/specific amount of storage as a slave to the Hadoop Cluster. So, Guys let’s get going…

Hadoop is a very well known and widespread distributed framework for Big Data Processing.

Problem statement:

In a Hadoop Cluster, how we can contribute a limited/specific amount of storage as a slave to the Cluster?

Is there any Pre-requisites❓

  • Umm!🤔 Yes, Hadoop Cluster to be configured. If you are new and don’t know how to setup Hadoop Cluster then you can check this out.

So, let’s find out the way to our problem statement →

Step 1: Adding Volume /Hard Disk to our Data Node.

As I’m using RHEL8 Machine we can attach external hard disk to our VM

Head to the Setting → and click on the Storage and,

Follow the below given procedure to attach external Hard Disk to our VM —

External Hard Disk is Attached successfully!

After the Hard Disk is attached then turn on your Name Node’s VM and Data Node’s VM.

Step 2: Creating Partitions to the External Hard Disk.

In order to use the Hard Disk we need to create the Partitions.

Use the below given command to list out all your Hard Disk with it’s Partitions

     fdisk -l

To create a Partition we need to first switch to that Hard Disk. In my case, the Hard Disk name is /dev/sdb which is of 8GiB in size.

Use the below command to make a partition.

   fdisk  <hard_disk_name>   fdisk   /dev/sdb

When you enter the above command then you are inside your hard disk.

Press n to make new partition

Press p to make primary partition

Mention the size of your partition like (for eg: +1G, +2G). In my case, I’ve created the partition of 2GiB.

Press w to save and exit from the partition.

One partition has been created successfully!

Partition is created and the name of my partition is /dev/sdb1

Step 3: Formatting the Partition.

After the partition has been created , it’s compulsory to format it. When you do Formatting, Internally Inode Table is created. There’re are several format types available(like ext2, ext3, ext4, xfs , nfs, etc.). In my case , I’ve formatted the partition using ext4 format type.

Use the below given command to format the Partition

 mkfs.ext4 <partition_name> mkfs.ext4  /dev/sdb1
Partition has been formatted successfully!

Step 4: Mounting the Partition.

Use the below given command to mount the partition on the directory.

 mount /dev/sdb1  /datanode

Note: All the above given process is done on the Data Node. Because Data Node is the one who will contribute the storage to our Name Node.

Step 5: Starting the Name Node and Data Node.

Disable the firewall and run the below given command

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode

Step 6: Testing Time.

Switch to Name Node and run the command to check the Admin report. This command will give the details of How many data nodes are connected? and How many size of storage they are providing to our Hadoop Cluster?

  hadoop dfsadmin -report

Here you can see that 1 Data Node is Connected and it is providing the storage of approx 1.91 GB.

You can visit to the Portal of Hadoop Cluster

  <NameNode_IP:50070>

Hip Hip Hurray! Our milestone is achieved. This is the way how we can contribute specific/limited amount of storage to our Hadoop Cluster.

Hope you find my Blog interesting and helpful!

That’s all! Signing off!👋

Thank you!

--

--

Sumayya Khatoon

Machine Learning || Deep Learning || Kubernetes|| Docker || AWS || Jenkins || Ansible(RH294) || Python || Linux(RHEL8 )