Wednesday, 5 October 2016

Hadoop Single node Installation

----------------------------------------------------------------------------------------- 

1 Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop Distributed File System (HDFS).

2 Prerequisites
2.1 Supported Platforms

Ø  Linux is supported as a development and production platform. Hadoop has been demonstrated on Linux clusters with 2000 nodes.

Ø  Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

2.2 Required Software

Ø  Download VMWERE from below link
Or
                You can get VMware exe file from our share folder location
Open SotwareDownloads/oracle folder
Ø  Download Ubuntu iso desktop image file from any of the below two links
Note: If your windows is 64 bit then only you can go for Ubuntu 64 bit or 32 bit version and it should be desktop version.
Ø  JavaTM 1.6.x, preferably from Sun, must be installed.
Ø  ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

2.3 Installing Software

sudo apt-get install openjdk-7-jdk
sudo apt-get install mysql-server
sudo apt-get install openssh-server
sudo apt-get install openssh-localhost
sudo apt-get update
sudo apt-get install apache2

3 Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4 Download

4.1. Download latest hadoop tar file from Apache website(http://apache.osuosl.org).

   In this example we are using hadoop-1.2.1.tar.gz and assuming that downloaded file is present in the /home/hadoop/Downloads.



4.2. create a new folder under your user.

  /home/hadoop

  mkdir work

5. Now copy the "hadoop-1.2.1.tar.gz" from /home/hadoop/Downloads to /home/hadoop/work

  cp /home/hadoop/Downloads/hadoop-1.1.2.tar.gz /home/hadoop/work/

6. go to /home/hadoop/work/ and extract the tar file

   tar -xvf hadoop-1.2.1.tar.gz

7. Now add Environmental variables to .bashrc file

   /home/hadoop/.bashrc

  Note: .bashrc file is hidden file and can be viewed with the below list command.

 
  Now edit the file.

   gedit /home/hadoop/.bashrc  or gedit ~/.bashrc

   And add the below export variables to .bashrc file.

  export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
  export HADOOP_HOME=/home/hadoop/work/hadoop/
  export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

   save the file and quit.

Add the java path into hadoop-env.sh xml file

8. Now run the .bashrc file to reflect the changes.

  . /home/hadoop/.bashrc or . .bashrc

9. Verify the PATH environmental variables.

    echo $PATH
    echo $JAVA_HOME
    echo HADOOP_HOME

hadoop     --->  you should able to run hadoop command anyware from the file system.
   

10. Now modify the below hadoop conf files.

  core-site.xml
  -------------

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
   <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/tmp</value>
  </property>
</configuration>


 hdfs-site.xml
 -------------

 <configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
   <property>
    <name>dfs.name.dir</name>
    <value>/home/hadoop/work/dfs/name</value>
  </property>
   <property>
    <name>dfs.data.dir</name>
    <value>/home/hadoop/work/dfs/data</value>
  </property>
</configuration>


mapred-site.xml
---------------

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
   <property>
    <name>mapred.local.dir</name>
    <value>/home/hadoop/work/mapred/local</value>
  </property>
   <property>
    <name>mapred.system.dir</name>
    <value>/home/hadoop/work/mapred/system</value>
  </property>

 <property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/work/hdfs/tmp</value>
  <description>A base for other temporary directories. </description>
</property>
</configuration>

In master xml file keep localhost as it is same
In slave xml file also keep localhost as it is same

11. format the Name node for the first time using the below command.

hadoop namenode -format

12. Now start the Cluster.

    start-dfs.sh --> to start HDFS
    start-mapred.sh --> to start Mapreduce

    or

   start-all.sh --> to start both dfs & mapreduce.


12. type "JPS" command to verify the processes.

SecondaryNameNode
JobTracker
DataNode
NameNode
TaskTracker

13. verify the Namenode Administration.

   http://localhost:50070

14. Verify the Job Tracker Administration.

  http://localhost:50030

15.Verify the secondary namenode Administration.

  http://localhost:50075

16.Verify the task tracker Administration.


  http://localhost:50060

No comments:

Post a Comment

spark_streaming_examples

Create Spark Streaming Context: ========================================== scala: --------------- import org.apache.spark._ import ...