-----------------------------------------------------------------------------------------
This document describes how to set up and
configure a single-node Hadoop installation so that you can quickly perform
simple operations using Hadoop Distributed File System (HDFS).
2 Prerequisites
2.1 Supported Platforms
Ø Linux
is supported as a development and production platform. Hadoop has been
demonstrated on Linux clusters with 2000 nodes.
Ø Win32
is supported as a development platform. Distributed operation has not been well
tested on Win32, so it is not supported as a production platform.
2.2 Required Software
Ø Download
VMWERE from below link
Or
You
can get VMware exe file from our share folder location
Open
SotwareDownloads/oracle folder
Ø Download
Ubuntu iso desktop image file from any of the below two links
Note:
If your windows is 64 bit then only you can go for Ubuntu 64 bit or 32 bit
version and it should be desktop version.
Ø JavaTM
1.6.x, preferably from Sun, must be installed.
Ø ssh
must be installed and sshd must be running to use the Hadoop scripts that
manage remote Hadoop daemons.
2.3 Installing Software
sudo apt-get install openjdk-7-jdk
sudo apt-get install mysql-server
sudo apt-get install openssh-server
sudo apt-get install openssh-localhost
sudo apt-get update
sudo apt-get install apache2
3 Setup passphraseless ssh
Now check that you can ssh to the localhost
without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a
passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >>
~/.ssh/authorized_keys
4 Download
In
this example we are using hadoop-1.2.1.tar.gz and assuming that downloaded file
is present in the /home/hadoop/Downloads.
4.2. create a new folder under your user.
/home/hadoop
mkdir
work
5. Now copy the
"hadoop-1.2.1.tar.gz" from /home/hadoop/Downloads to /home/hadoop/work
cp
/home/hadoop/Downloads/hadoop-1.1.2.tar.gz /home/hadoop/work/
6. go to /home/hadoop/work/ and extract the
tar file
tar
-xvf hadoop-1.2.1.tar.gz
7. Now add Environmental variables to .bashrc
file
/home/hadoop/.bashrc
Note:
.bashrc file is hidden file and can be viewed with the below list command.
Now
edit the file.
gedit /home/hadoop/.bashrc or
gedit ~/.bashrc
And
add the below export variables to .bashrc file.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/home/hadoop/work/hadoop/
export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
save
the file and quit.
Add
the java path into hadoop-env.sh xml file
8. Now run the .bashrc file to reflect the
changes.
.
/home/hadoop/.bashrc or . .bashrc
9. Verify the PATH environmental variables.
echo $PATH
echo $JAVA_HOME
echo HADOOP_HOME
hadoop
---> you should able to run
hadoop command anyware from the file system.
10. Now modify the below hadoop conf files.
core-site.xml
-------------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
-------------
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/work/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/work/dfs/data</value>
</property>
</configuration>
mapred-site.xml
---------------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/work/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/work/mapred/system</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/work/hdfs/tmp</value>
<description>A base for other temporary directories. </description>
</property>
</configuration>
In master xml file keep localhost as it is
same
In slave xml file also keep localhost as it
is same
11. format the Name node for the first time
using the below command.
hadoop namenode -format
12. Now start the Cluster.
start-dfs.sh
--> to start HDFS
start-mapred.sh --> to start Mapreduce
or
start-all.sh
--> to start both dfs & mapreduce.
12. type "JPS" command to verify
the processes.
SecondaryNameNode
JobTracker
DataNode
NameNode
TaskTracker
13. verify the Namenode Administration.
http://localhost:50070
14. Verify the Job Tracker Administration.
http://localhost:50030
15.Verify the secondary namenode
Administration.
http://localhost:50075
16.Verify the task tracker Administration.
http://localhost:50060
No comments:
Post a Comment