Configuring Hadoop Cluster using Ansible Playbook | Task-11.1 | ARTH

4 min readMar 21, 2021

Introduction

Before going into the task we need to know what is Hadoop.
Hadoop is tool/software used for solving the Big-Data problem.
By using the Hadoop cluster we can distribute our data into different nodes. One Node is Namenode(like master node), which client use to connect slave nodes for transferring the data. It also maintain the metadata of the cluster.
Other nodes are called Datanode which are use to store data from client, they share a part of their storage to the cluster.
To connect Namenode & Datanode we need to configure two files:-
One is “hdfs-site.xml” file, where we have to provide the folder name which node is sharing to the cluster.
Second is “core-site.xml” file, here we have to give the IP of Namenode & port no. on which we are running our Hadoop cluster.

Pre-Requisites

Before doing the you have to install Ansible in your system with an inventory file in “/etc/ansible/ansible.cfg” folder.
For Hadoop you need to have two packages in your system which are “jdk-8u171-linux-x64.rpm” & “hadoop-1.2.1–1.x86_64.rpm”.

Task-Description

Task-11.1 — Configure Hadoop and start cluster services using Ansible Playbook.

Task-Steps

Step-1: For making ansible file dynamic we need to include some variables. In this step I am defining some variable.

Step-2: Now firstly we have to copy rpm file from local system to host system. Here I am using loop to transfer multiple files.

Step-3: Now I am installing the softwares in the host system.

Step-4: In introduction part I said we have to specify the folder which node is contributing in the cluster.
Step-5: Also in next step I am copying the “hdfs-site.xml” & “core-site.xml” files in “/etc/hadoop/” folder.

Let me show you what “hdfs-site.xml” & “core-site.xml” files look like.

Here I use Jinja to make file dynamic that’s why I use template module instead of copy module while copying the files.

Step-6: Last step is to start this node using command:-

#For Namenode
hadoop-daemon.sh start namenode
#For Datanode
hadoop-daemon.sh start datanode

Now let me run the play-book:-

I run the play-book before for testing that’s why it show ok=6 & changed=2

Everything is fine here all change are made, now lets see the Target node:-

Packages are copied —

Softwares are installed —

Folders are present in “/etc/hadoop/” folder —

Files are configured —

Now lets check the node started or not —

Datanode:- The above configuration is for namenode for datanode you have to do just two changes in the variables, change node variable from “name” to “data” you can also change the folder if you want.