Deploying Hadoop Multi-Node Cluster on RHEL8!!
Hello Folks! In this Blog I’ll guide you the way to setup Hadoop Multi-Node Cluster manually from scratch. So, Guys let’s get going…
Data, Data and Data. Across every sectors people are dealing with huge and colossal amount of data which is also termed as Big Data. Hadoop is a very well known and widespread distributed framework for Big Data Processing. But when it comes to Hadoop Installation, most of us feel that it’s quite cumbersome job. So, this article will give a way to setup Hadoop Multi-Node Cluster.
What is Hadoop Multi-Node Cluster?
A Multi-Node Cluster in Hadoop Contains two or more Data Nodes in a distributed Hadoop Environment. This is used in organizations to store and analyze their massive amount of Data. So knowing how to setup a Multi-Node Hadoop Cluster setup is an important task.
Is there any Pre-requisites?
- Umm!🤔 Yes, you need to create at least one VM for acting as Master Node(Name Node) and at least one VM to act as a Worker Node(Data Node).
- Basic knowledge of Hadoop is required.
Connect both the VM’s using two putty terminals. Decide which VM would be the master/manager/name node and which would be worker/slave/data node Henceforward we will call them Name Node(Master) and Worker Nodes(Data Nodes).
Software required to setup the Cluster:
- Hotspot version of JDK is required.
- Apache Hadoop 1.2.1
- Download Both the software and transfer it using the WinSCP Tool to your Linux System.
Now , we’re good to go…
So, let’s setup the Hadoop Cluster →
Configuring Name Node /Master Node:
Step 1: Installing the required software.
After transferring both the software you need to install it. As it is in the RPM format we can use rpm command.
Installing JDK:
rpm -ivh <jdk_file_name> java -version
Installing Hadoop Software:
rpm -ivh <hadoop_file> --force hadoop version
Step2: Configuring Name Node.
Now create a directory as per your choice, in my case I’ve created a directory called /namenode in my root folder.
mkdir /namenode
After installing Hadoop , /etc/hadoop directory will be created automatically. Switch to /etc/hadoop directory and open hdfs-site.xml file with any of your favorite editor.
Go inside the /etc/hadoop/hdfs-site.xml file using vim editor and write the property between the configuration tag.
Property of Name Node inside hdfs-site.xml file:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/namenode</value>
</property>
</configuration>
Also we need to add the property into core-site.xml file. So go inside the /etc/hadoop/core-site.xml file using vim editor and write the property between configuration tag.
Property of Name Node inside core-site.xml file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode_IP:9001</value>
</property>
</configuration>
Now we’re almost done with configuring Name Node…
Step3: Formatting the Name Node.
Command to format the Name Node
hadoop namenode -format
Step 4: Starting the Name Node Services.
Now the last step is to make the Name Node services up. So, command for the same
hadoop-daemon.sh start namenode
Confirm whether the service is up or not using command
jps
Voila! Master Node is Configured Successfully you can check the Admin report of Master Node using command
hadoop dfsadmin -report
Our Master Node is Configured Successfully but no Slave node are connected yet. The Slaves will be connected when we’ll configure it.
So, let’s configure our Data Node →
Configuring Data Node /Slave Node:
Step 1: Installing the required software.
Also in Slave node we require JDK and Hadoop software so download the software and transfer it using WinSCP Tool .After transferring both the software you need to install it. As it is in the RPM format we can use rpm command.
Installing JDK:
rpm -ivh <jdk_file_name> java -version
Installing Hadoop Software:
rpm -ivh <hadoop_file> --force hadoop version
Step2: Configuring Data Node.
Now create a directory as per your choice, in my case I’ve created a directory called /datanode1 in my root folder.
mkdir /datanode1
After installing Hadoop , /etc/hadoop directory will be created automatically. Switch to /etc/hadoop directory and open hdfs-site.xml file with any of your favorite editor.
Go inside the /etc/hadoop/hdfs-site.xml file using vim editor and write the property between the configuration tag.
Property of Data Node inside hdfs-site.xml file:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/datanode1</value>
</property>
</configuration>
Also we need to add the property into core-site.xml file. So go inside the /etc/hadoop/core-site.xml file using vim editor and write the property between configuration tag.
Property of Data Node inside core-site.xml file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode_IP:9001</value>
</property>
</configuration>
Note : core-site.xml file is same for Master and Slave Node.
Step 3: Starting the Data Node Services.
Now the last step is to make the Data Node services up. So, command for the same
hadoop-daemon.sh start datanode
Confirm whether the service is up or not using command
jps
Voila! Our Data Node is also Configured Successfully…
For connecting Master and Slave Node, disable the firewall using command.
systemctl stop firewalld
Switch to Name Node and run the below given command.
hadoop dfsadmin -report
Firstly, there wasn’t any Data Node connected and now you can see 1 Data Node is connected. You can connect as many as Data Nodes possible.
You can also go to the portal of Hadoop Cluster.
<NameNode_IP:50070>
If you want to configure more slave nodes you can follow the same procedure.
Hip Hip Hurray! We’ve finally configured our Hadoop Multi-Node Cluster.
Hope you find my Blog Helpful and Interesting!
Do Like Comment and give a clap!
That’s all! Signing off!👋
Thank You!