Deploying Hadoop Multi-Node Cluster on RHEL8!!

Sumayya Khatoon
6 min readJul 7, 2021

--

Hello Folks! In this Blog I’ll guide you the way to setup Hadoop Multi-Node Cluster manually from scratch. So, Guys let’s get going…

Data, Data and Data. Across every sectors people are dealing with huge and colossal amount of data which is also termed as Big Data. Hadoop is a very well known and widespread distributed framework for Big Data Processing. But when it comes to Hadoop Installation, most of us feel that it’s quite cumbersome job. So, this article will give a way to setup Hadoop Multi-Node Cluster.

What is Hadoop Multi-Node Cluster?

A Multi-Node Cluster in Hadoop Contains two or more Data Nodes in a distributed Hadoop Environment. This is used in organizations to store and analyze their massive amount of Data. So knowing how to setup a Multi-Node Hadoop Cluster setup is an important task.

Is there any Pre-requisites?

  • Umm!🤔 Yes, you need to create at least one VM for acting as Master Node(Name Node) and at least one VM to act as a Worker Node(Data Node).
  • Basic knowledge of Hadoop is required.

Connect both the VM’s using two putty terminals. Decide which VM would be the master/manager/name node and which would be worker/slave/data node Henceforward we will call them Name Node(Master) and Worker Nodes(Data Nodes).

Software required to setup the Cluster:

  • Hotspot version of JDK is required.
  • Apache Hadoop 1.2.1
  • Download Both the software and transfer it using the WinSCP Tool to your Linux System.

Now , we’re good to go…

So, let’s setup the Hadoop Cluster →

Configuring Name Node /Master Node:

Step 1: Installing the required software.

After transferring both the software you need to install it. As it is in the RPM format we can use rpm command.

Installing JDK:

 rpm -ivh <jdk_file_name> java -version

Installing Hadoop Software:

 rpm -ivh <hadoop_file> --force hadoop version

Step2: Configuring Name Node.

Now create a directory as per your choice, in my case I’ve created a directory called /namenode in my root folder.

  mkdir /namenode

After installing Hadoop , /etc/hadoop directory will be created automatically. Switch to /etc/hadoop directory and open hdfs-site.xml file with any of your favorite editor.

Go inside the /etc/hadoop/hdfs-site.xml file using vim editor and write the property between the configuration tag.

Property of Name Node inside hdfs-site.xml file:

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/namenode</value>
</property>
</configuration>

Also we need to add the property into core-site.xml file. So go inside the /etc/hadoop/core-site.xml file using vim editor and write the property between configuration tag.

Property of Name Node inside core-site.xml file:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode_IP:9001</value>
</property>
</configuration>

Now we’re almost done with configuring Name Node…

Step3: Formatting the Name Node.

Command to format the Name Node

  hadoop namenode -format

Step 4: Starting the Name Node Services.

Now the last step is to make the Name Node services up. So, command for the same

hadoop-daemon.sh start namenode

Confirm whether the service is up or not using command

    jps

Voila! Master Node is Configured Successfully you can check the Admin report of Master Node using command

  hadoop dfsadmin -report

Our Master Node is Configured Successfully but no Slave node are connected yet. The Slaves will be connected when we’ll configure it.

So, let’s configure our Data Node →

Configuring Data Node /Slave Node:

Step 1: Installing the required software.

Also in Slave node we require JDK and Hadoop software so download the software and transfer it using WinSCP Tool .After transferring both the software you need to install it. As it is in the RPM format we can use rpm command.

Installing JDK:

 rpm -ivh <jdk_file_name> java -version

Installing Hadoop Software:

 rpm -ivh <hadoop_file> --force hadoop version

Step2: Configuring Data Node.

Now create a directory as per your choice, in my case I’ve created a directory called /datanode1 in my root folder.

  mkdir /datanode1

After installing Hadoop , /etc/hadoop directory will be created automatically. Switch to /etc/hadoop directory and open hdfs-site.xml file with any of your favorite editor.

Go inside the /etc/hadoop/hdfs-site.xml file using vim editor and write the property between the configuration tag.

Property of Data Node inside hdfs-site.xml file:

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/datanode1</value>
</property>
</configuration>

Also we need to add the property into core-site.xml file. So go inside the /etc/hadoop/core-site.xml file using vim editor and write the property between configuration tag.

Property of Data Node inside core-site.xml file:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode_IP:9001</value>
</property>
</configuration>

Note : core-site.xml file is same for Master and Slave Node.

Step 3: Starting the Data Node Services.

Now the last step is to make the Data Node services up. So, command for the same

hadoop-daemon.sh start datanode

Confirm whether the service is up or not using command

  jps

Voila! Our Data Node is also Configured Successfully…

For connecting Master and Slave Node, disable the firewall using command.

 systemctl stop firewalld

Switch to Name Node and run the below given command.

 hadoop dfsadmin -report

Firstly, there wasn’t any Data Node connected and now you can see 1 Data Node is connected. You can connect as many as Data Nodes possible.

You can also go to the portal of Hadoop Cluster.

  <NameNode_IP:50070>
1 Live Node is Connected…

If you want to configure more slave nodes you can follow the same procedure.

2 Data Nodes are available!
2 Live Nodes are Up

Hip Hip Hurray! We’ve finally configured our Hadoop Multi-Node Cluster.

Hope you find my Blog Helpful and Interesting!

Do Like Comment and give a clap!

That’s all! Signing off!👋

Thank You!

--

--

Sumayya Khatoon
Sumayya Khatoon

Written by Sumayya Khatoon

Machine Learning || Deep Learning || Kubernetes|| Docker || AWS || Jenkins || Ansible(RH294) || Python || Linux(RHEL8 )

No responses yet