HADOOP SCHOOL: Hadoop Single node Installation

-----------------------------------------------------------------------------------------

1 Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop Distributed File System (HDFS).

2 Prerequisites

2.1 Supported Platforms

Ø Linux is supported as a development and production platform. Hadoop has been demonstrated on Linux clusters with 2000 nodes.

Ø Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

2.2 Required Software

Ø Download VMWERE from below link

https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/12_0

You can get VMware exe file from our share folder location

\\192.168.0.83

Open SotwareDownloads/oracle folder

Ø Download Ubuntu iso desktop image file from any of the below two links

ubuntu-14.04.4-desktop-amd64.iso

http://www.ubuntu.com/download/desktop/contribute?version=16.04.1&architecture=amd64

Note: If your windows is 64 bit then only you can go for Ubuntu 64 bit or 32 bit version and it should be desktop version.

Ø JavaTM 1.6.x, preferably from Sun, must be installed.

Ø ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

2.3 Installing Software

sudo apt-get install openjdk-7-jdk

sudo apt-get install mysql-server

sudo apt-get install openssh-server

sudo apt-get install openssh-localhost

sudo apt-get update

sudo apt-get install apache2

3 Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4 Download

4.1. Download latest hadoop tar file from Apache website(http://apache.osuosl.org).

In this example we are using hadoop-1.2.1.tar.gz and assuming that downloaded file is present in the /home/hadoop/Downloads.

4.2. create a new folder under your user.

/home/hadoop

mkdir work

5. Now copy the "hadoop-1.2.1.tar.gz" from /home/hadoop/Downloads to /home/hadoop/work

cp /home/hadoop/Downloads/hadoop-1.1.2.tar.gz /home/hadoop/work/

6. go to /home/hadoop/work/ and extract the tar file

tar -xvf hadoop-1.2.1.tar.gz

7. Now add Environmental variables to .bashrc file

/home/hadoop/.bashrc

Note: .bashrc file is hidden file and can be viewed with the below list command.

Now edit the file.

gedit /home/hadoop/.bashrc or gedit ~/.bashrc

And add the below export variables to .bashrc file.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

export HADOOP_HOME=/home/hadoop/work/hadoop/

export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

save the file and quit.

Add the java path into hadoop-env.sh xml file

8. Now run the .bashrc file to reflect the changes.

. /home/hadoop/.bashrc or . .bashrc

9. Verify the PATH environmental variables.

echo $PATH

echo $JAVA_HOME

echo HADOOP_HOME

hadoop ---> you should able to run hadoop command anyware from the file system.

10. Now modify the below hadoop conf files.

core-site.xml

-------------

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/tmp</value>

</property>

</configuration>

hdfs-site.xml

-------------

<name>dfs.replication</name>

</property>

<value>/home/hadoop/work/dfs/name</value>

</property>

<value>/home/hadoop/work/dfs/data</value>

</property>

</configuration>

mapred-site.xml

---------------

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<name>mapred.local.dir</name>

<value>/home/hadoop/work/mapred/local</value>

</property>

<name>mapred.system.dir</name>

<value>/home/hadoop/work/mapred/system</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/work/hdfs/tmp</value>

<description>A base for other temporary directories. </description>

</property>

</configuration>

In master xml file keep localhost as it is same

In slave xml file also keep localhost as it is same

11. format the Name node for the first time using the below command.

hadoop namenode -format

12. Now start the Cluster.

start-dfs.sh --> to start HDFS

start-mapred.sh --> to start Mapreduce

start-all.sh --> to start both dfs & mapreduce.

12. type "JPS" command to verify the processes.

SecondaryNameNode

JobTracker

DataNode

NameNode

TaskTracker

13. verify the Namenode Administration.

http://localhost:50070

14. Verify the Job Tracker Administration.

http://localhost:50030

15.Verify the secondary namenode Administration.

http://localhost:50075

16.Verify the task tracker Administration.

http://localhost:50060

HADOOP SCHOOL

Wednesday, 5 October 2016

Hadoop Single node Installation

No comments:

Post a Comment

spark_streaming_examples

Bigdata Analyst

Search This Blog