How to Install Hadoop with Step by Step Configuration on Linux Ubuntu

In this tutorial, we will take you through step by step process to install Apache Hadoop on a Linux box (Ubuntu). This is 2 part process

There are 2 Prerequisites

Part 1) Download and Install Hadoop

Step 1) Add a Hadoop system user using below command

sudo addgroup hadoop_

Hadoop Setup Tutorial - Installation & Configuration

sudo adduser --ingroup hadoop_ hduser_

Hadoop Setup Tutorial - Installation & Configuration

Enter your password, name and other details.

NOTE: There is a possibility of below-mentioned error in this setup and installation process.

“hduser is not in the sudoers file. This incident will be reported.”

Hadoop Setup Tutorial - Installation & Configuration

This error can be resolved by Login as a root user

Hadoop Setup Tutorial - Installation & Configuration

Execute the command

sudo adduser hduser_ sudo

Hadoop Setup Tutorial - Installation & Configuration

Re-login as hduser_

Hadoop Setup Tutorial - Installation & Configuration

Step 2) Configure SSH

In order to manage nodes in a cluster, Hadoop requires SSH access

First, switch user, enter the following command

su - hduser_

Hadoop Setup Tutorial - Installation & Configuration

This command will create a new key.

ssh-keygen -t rsa -P ""

Hadoop Setup Tutorial - Installation & Configuration

Enable SSH access to local machine using this key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Hadoop Setup Tutorial - Installation & Configuration

Now test SSH setup by connecting to localhost as ‘hduser’ user.

ssh localhost

Hadoop Setup Tutorial - Installation & Configuration

Note: Please note, if you see below error in response to ‘ssh localhost’, then there is a possibility that SSH is not available on this system-

Hadoop Setup Tutorial - Installation & Configuration

To resolve this –

Purge SSH using,

sudo apt-get purge openssh-server

It is good practice to purge before the start of installation

Hadoop Setup Tutorial - Installation & Configuration

Install SSH using the command-

sudo apt-get install openssh-server

Hadoop Setup Tutorial - Installation & Configuration

Step 3) Next step is to Download Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Select Stable

Hadoop Setup Tutorial - Installation & Configuration

Select the tar.gz file ( not the file with src)

Hadoop Setup Tutorial - Installation & Configuration

Once a download is complete, navigate to the directory containing the tar file

Hadoop Setup Tutorial - Installation & Configuration

Enter,

sudo tar xzf hadoop-2.2.0.tar.gz

Hadoop Setup Tutorial - Installation & Configuration

Now, rename hadoop-2.2.0 as hadoop

sudo mv hadoop-2.2.0 hadoop

Hadoop Setup Tutorial - Installation & Configuration

sudo chown -R hduser_:hadoop_ hadoop

Hadoop Setup Tutorial - Installation & Configuration

Part 2) Configure Hadoop

Step 1) Modify ~/.bashrc file

Add following lines to end of file ~/.bashrc

#Set HADOOP_HOME
export HADOOP_HOME=<Installation Directory of Hadoop>
#Set JAVA_HOME
export JAVA_HOME=<Installation Directory of Java>
# Add bin/ directory of Hadoop to PATH
export PATH=$PATH:$HADOOP_HOME/bin

Hadoop Setup Tutorial - Installation & Configuration

Now, source this environment configuration using below command

. ~/.bashrc

Hadoop Setup Tutorial - Installation & Configuration

Step 2) Configurations related to HDFS

Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Hadoop Setup Tutorial - Installation & Configuration

Hadoop Setup Tutorial - Installation & Configuration

With

Hadoop Setup Tutorial - Installation & Configuration

There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-

1. ‘hadoop.tmp.dir’ – Used to specify a directory which will be used by Hadoop to store its data files.

2. ‘fs.default.name’ – This specifies the default file system.

To set these parameters, open core-site.xml

sudo gedit $HADOOP_HOME/etc/hadoop/core-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Copy below line in between tags <configuration></configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS </name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. </description>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Navigate to the directory $HADOOP_HOME/etc/Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Now, create the directory mentioned in core-site.xml

sudo mkdir -p <Path of Directory used in above setting>

Hadoop Setup Tutorial - Installation & Configuration

Grant permissions to the directory

sudo chown -R hduser_:Hadoop_ <Path of Directory created in above step>

Hadoop Setup Tutorial - Installation & Configuration

sudo chmod 750 <Path of Directory created in above step>

Hadoop Setup Tutorial - Installation & Configuration

Step 3) Map Reduce Configuration

Before you begin with these configurations, lets set HADOOP_HOME path

sudo gedit /etc/profile.d/hadoop.sh

And Enter

export HADOOP_HOME=/home/guru99/Downloads/Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Next enter

sudo chmod +x /etc/profile.d/hadoop.sh

Hadoop Setup Tutorial - Installation & Configuration

Exit the Terminal and restart again

Type echo $HADOOP_HOME. To verify the path

Hadoop Setup Tutorial - Installation & Configuration

Now copy files

sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Open the mapred-site.xml file

sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Add below lines of setting in between tags <configuration> and </configuration>

<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Open $HADOOP_HOME/etc/hadoop/hdfs-site.xml as below,

sudo gedit $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Add below lines of setting between tags <configuration> and </configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hduser_/hdfs</value>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Create a directory specified in above setting-

sudo mkdir -p <Path of Directory used in above setting>
sudo mkdir -p /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

sudo chown -R hduser_:hadoop_ <Path of Directory created in above step>
sudo chown -R hduser_:hadoop_ /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

sudo chmod 750 <Path of Directory created in above step>
sudo chmod 750 /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

Step 4) Before we start Hadoop for the first time, format HDFS using below command

$HADOOP_HOME/bin/hdfs namenode -format

Hadoop Setup Tutorial - Installation & Configuration

Step 5) Start Hadoop single node cluster using below command

$HADOOP_HOME/sbin/start-dfs.sh

An output of above command

Hadoop Setup Tutorial - Installation & Configuration

$HADOOP_HOME/sbin/start-yarn.sh

Hadoop Setup Tutorial - Installation & Configuration

Using ‘jps’ tool/command, verify whether all the Hadoop related processes are running or not.

Hadoop Setup Tutorial - Installation & Configuration

If Hadoop has started successfully then an output of jps should show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

Step 6) Stopping Hadoop

$HADOOP_HOME/sbin/stop-dfs.sh

Hadoop Setup Tutorial - Installation & Configuration

$HADOOP_HOME/sbin/stop-yarn.sh

Hadoop Setup Tutorial - Installation & Configuration