• Home
  • Testing
  • SAP
  • Web
  • Must Learn!
  • Big Data
  • Live Projects
  • Blog

Prerequisites:

You must have Ubuntu installed and running

You must have Java Installed.

Step 1) Add a Hadoop system user using below command

sudo addgroup hadoop_

Hadoop Setup Tutorial - Installation & Configuration

sudo adduser --ingroup hadoop_ hduser_

Hadoop Setup Tutorial - Installation & Configuration

Enter your password , name and other details.

NOTE:

There is a possibility of below mentioned error in this setup and installation process.

"hduser is not in the sudoers file. This incident will be reported."

Hadoop Setup Tutorial - Installation & Configuration

This error can be resolved by

Login as a root user

Hadoop Setup Tutorial - Installation & Configuration

Execute the command

sudo adduser hduser_ sudo

Hadoop Setup Tutorial - Installation & Configuration

Re-login as hduser_

Hadoop Setup Tutorial - Installation & Configuration

Step 2) . Configure SSH

In order to manage nodes in a cluster, Hadoop require SSH access

First, switch user, enter following command

su - hduser_

Hadoop Setup Tutorial - Installation & Configuration

This command will create a new key.

ssh-keygen -t rsa -P ""

Hadoop Setup Tutorial - Installation & Configuration

Enable SSH access to local machine using this key.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Hadoop Setup Tutorial - Installation & Configuration

Now test SSH setup by connecting to locahost as 'hduser' user.

ssh localhost

Hadoop Setup Tutorial - Installation & Configuration

Note:

Please note, if you see below error in response to 'ssh localhost', then there is a possibility that SSH is not available on this system-

Hadoop Setup Tutorial - Installation & Configuration

To resolve this -

Purge SSH using,

sudo apt-get purge openssh-server

It is good practice to purge before start of installation

Hadoop Setup Tutorial - Installation & Configuration

Install SSH using command-

sudo apt-get install openssh-server

Hadoop Setup Tutorial - Installation & Configuration

Step 3) Next step is to Download Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Select Stable

Hadoop Setup Tutorial - Installation & Configuration

Select the tar.gz file ( not the file with src)

Hadoop Setup Tutorial - Installation & Configuration

Once download is complete, navigate to the directory containing the tar file

Hadoop Setup Tutorial - Installation & Configuration

Enter , sudo tar xzf hadoop-2.2.0.tar.gz

Hadoop Setup Tutorial - Installation & Configuration

Now, rename rename hadoop-2.2.0 as hadoop

sudo mv hadoop-2.2.0 hadoop

Hadoop Setup Tutorial - Installation & Configuration

sudo chown -R hduser_:hadoop_ hadoop

Hadoop Setup Tutorial - Installation & Configuration

Step 4) Modify ~/.bashrc file

Add following lines to end of file ~/.bashrc

#Set HADOOP_HOME
export HADOOP_HOME=<Installation Directory of Hadoop>
#Set JAVA_HOME
export JAVA_HOME=<Installation Directory of Java>
# Add bin/ directory of Hadoop to PATH
export PATH=$PATH:$HADOOP_HOME/bin

Hadoop Setup Tutorial - Installation & Configuration


Now, source this environment configuration using below command

. ~/.bashrc

Hadoop Setup Tutorial - Installation & Configuration

Step 5) Configurations related to HDFS

Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Hadoop Setup Tutorial - Installation & Configuration

Hadoop Setup Tutorial - Installation & Configuration

With

Hadoop Setup Tutorial - Installation & Configuration

There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-

1. 'hadoop.tmp.dir' - Used to specify directory which will be used by Hadoop to store its data files.

2. 'fs.default.name' - This specifies the default file system.

To set these parameters, open core-site.xml

sudo gedit $HADOOP_HOME/etc/hadoop/core-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Copy below line in between tags <configuration></configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS </name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. </description>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Navigate to the directory $HADOOP_HOME/etc/Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Now, create the directory mentioned in core-site.xml

sudo mkdir -p <Path of Directory used in above setting>

Hadoop Setup Tutorial - Installation & Configuration

Grant permissions to the directory

sudo chown -R hduser_:Hadoop_ <Path of Directory created in above step>

Hadoop Setup Tutorial - Installation & Configuration

sudo chmod 750 <Path of Directory created in above step>

Hadoop Setup Tutorial - Installation & Configuration

Step 6) Map Reduce Configuration

Before you begin with these configurations, lets set HADOOP_HOME path

sudo gedit /etc/profile.d/hadoop.sh

And Enter

export HADOOP_HOME=/home/guru99/Downloads/Hadoop

Hadoop Setup Tutorial - Installation & Configuration

Next enter

sudo chmod +x /etc/profile.d/hadoop.sh

Hadoop Setup Tutorial - Installation & Configuration

Exit the Terminal and restart again

Type echo $HADOOP_HOME. To verify the path

Hadoop Setup Tutorial - Installation & Configuration

Now copy files

sudo cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Open the mapred-site.xml file

sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml

Hadoop Setup Tutorial - Installation & Configuration

Add below lines of setting in between tags <configuration> and </configuration>

<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Open $HADOOP_HOME/etc/hadoop/hdfs-site.xml as below,

sudo gedit $HADOOP_HOME/etc/hadoop/hdfs-site.xml

 

 

Hadoop Setup Tutorial - Installation & Configuration

Add below lines of setting between tags <configuration> and </configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hduser_/hdfs</value>
</property>

Hadoop Setup Tutorial - Installation & Configuration

Create directory specified in above setting-

sudo mkdir -p <Path of Directory used in above setting>

sudo mkdir -p /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

sudo chown -R hduser_:hadoop_ <Path of Directory created in above step>

sudo chown -R hduser_:hadoop_ /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

sudo chmod 750 <Path of Directory created in above step>

sudo chmod 750 /home/hduser_/hdfs

Hadoop Setup Tutorial - Installation & Configuration

Step 7) Before we start Hadoop for the first time, format HDFS using below command

$HADOOP_HOME/bin/hdfs namenode -format

Hadoop Setup Tutorial - Installation & Configuration

Step 8) Start Hadoop single node cluster using below command

$HADOOP_HOME/sbin/start-dfs.sh

Output of above command

Hadoop Setup Tutorial - Installation & Configuration

$HADOOP_HOME/sbin/start-yarn.sh

Hadoop Setup Tutorial - Installation & Configuration

Using 'jps' tool/command, verify whether all the Hadoop related processes are running or not.

Hadoop Setup Tutorial - Installation & Configuration

If Hadoop has started successfully then output of jps should show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

Step 9) Stopping Hadoop

$HADOOP_HOME/sbin/stop-dfs.sh

Hadoop Setup Tutorial - Installation & Configuration

$HADOOP_HOME/sbin/stop-yarn.sh

Hadoop Setup Tutorial - Installation & Configuration

 

YOU MIGHT LIKE: