Wednesday, 13 November 2013

Hadoop Installation on RHEL 



http://hadoop.apache.org/docs/r0.18.3/hdfs_design.html

http://www.bigdataplanet.info/2013/10/Hadoop-Installation-on-Local-Machine-Single-node-Cluster.html


As requested by many of our visitors and subscribes, here i am with the single node cluster installation of Hadoop on Ubuntu.

So if you are new to hadoop you can follow the below links to get some idea about:

What is Hadoop?  and also  Hadoop tutorial series.

The main goal of this tutorial is to start working with hadoop by making a simple single node hadoop cluster even at your home. And start working around with different tools, codes, syntax and software related to hadoop.

Note: here i am using ubuntu 12.04 with apache hadoop 1.2.1 (most stable version till date) for running the pseudo node cluster.

So now lets get started.

Prerequisite for Hadoop Installation.


Java 1.6 + (aka java 6)

Hadoop requires java 1.5+ for its working but Java 1.6 (aka java 6 ) is recommended.
So first thing you need in your machine is java 1.6. Check you have java 1.6 installed or not.


$ java -version

If it is not there you can install the same with the below command:

$ sudo apt-get install openjdk-6-jre


after installation check if java is installed properly or not :

$java -version

java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.4) (rhel-1.21.b17.el6-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

If the above output comes, java is installed properly on your system. You can check for the installation package at   /usr/lib/jvm/


Adding a dedicated system user


I prefer to have a dedicated system user for hadoop and the same is also recommended. It helps to separate the hadoop installation with other software application and also with the user account running on the single node. So for creating a separate user you can use the below commands:

$sudo groupadd hadoop
$sudo useradd -g  hadoop hduser

This will add a user hduser and a group hadoop to your local machine.


After the SSH server installation. we have to generate an SSH key for the hduser.


$ su - hduser
$ ssh-keygen -t rsa -P ""


Here the second command will generate a key pair with an empty password.

Note: Empty key is not recommended but here we are putting the key as empty as we don't want to enter the password every time hadoop interacts with its nodes.

Now since the key pair is generated we have to enable SSH access to local machine with this newly created key. For that you have put the below command.


hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config

Finally you can check for the same using command:

$ ssh localhost



Hadoop Installation


Download and Extract Hadoop

So if you have all the above prerequisite in your machine,you are good to go with the hadoop installation.
First download Hadoop from HERE and extract the same at any location, i kept it at /usr/local. Also you need to change the owner permission of all files to hduser and group to hadoop.

$ cd /usr/local
$ sudo tar xzf hadoop-1.2.1.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop


Update $HOME/.bashrc

Update the following lines at the end of $Home/.bashrc file of user hduser. Well if you are using a different shell than bash, you have to update the appropriate configuration file.

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
export PATH=$PATH:$HADOOP_HOME/bin


Configuration File Setup

Till now we are almost done with the hadoop installation. Now what we have to do is, change a few properties of the configuration file provided in Hadoop Conf folder.
But before that we have to make a directory where we are going to save our data on the local node cluster. We will be saving our data on HDFS.

So lets create the directory and set the required ownership and permission.

$ sudo mkdir /tmp/hadoop_data
$ sudo chown hduser:hadoop /tmp/hadoop_data
$ sudo chmod 777 /tmp/hadoop_data

Now lets start changing a few of the required configuration file.

Note: you will find all these configuration file inside hadoop/conf directory where you have put your file. In my case it is at /usr/local/hadoop/conf.
hadoop-env.sh

Open the hadoop-env.sh file and change the only required environment variable for local machine installation. And it is JAVA_HOME. For this you just need to uncomment the below line and set the JAVA_HOME environment to your JDK/JRE directory.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

core-site.xml

In between <configuration> ... </configuration> put the below code:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop_data</value>
  <description>directory for hadoop data</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description> data to be put on this URI</description>
</property>

mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>...
  </description>
</property>


hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>


Formatting and Starting the Single Node Cluster.

So if you are done till now successfully, you are done with the installation part. Now we just have to format the namenode and start the cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

the output will be something like:

/usr/local/hadoop/bin/hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.

13/10/22 22:57:23 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadooprhcl.example.com/192.168.11.134
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_17
************************************************************/
13/10/22 22:57:24 INFO util.GSet: Computing capacity for map BlocksMap
13/10/22 22:57:24 INFO util.GSet: VM type       = 64-bit
13/10/22 22:57:24 INFO util.GSet: 2.0% max memory = 1013645312
13/10/22 22:57:24 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/10/22 22:57:24 INFO util.GSet: recommended=2097152, actual=2097152
13/10/22 22:57:24 INFO namenode.FSNamesystem: fsOwner=hduser
13/10/22 22:57:24 INFO namenode.FSNamesystem: supergroup=supergroup
13/10/22 22:57:24 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/10/22 22:57:25 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/10/22 22:57:25 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/10/22 22:57:25 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
13/10/22 22:57:25 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/10/22 22:57:25 INFO common.Storage: Image file /tmp/hadoop_data/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
13/10/22 22:57:25 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop_data/dfs/name/current/edits
13/10/22 22:57:25 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop_data/dfs/name/current/edits
13/10/22 22:57:25 INFO common.Storage: Storage directory /tmp/hadoop_data/dfs/name has been successfully formatted.
13/10/22 22:57:25 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadooprhcl.example.com/192.168.11.134
************************************************************/





Starting the single node cluster:

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

After the start-up you will get an output like:



[hduser@hadooprhcl ~]$ /usr/local/hadoop/bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-namenode-hadooprhcl.example.com.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-datanode-hadooprhcl.example.com.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-hadooprhcl.example.com.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-jobtracker-hadooprhcl.example.com.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-tasktracker-hadooprhcl.example.com.out



The above command starts the Namenode, Datanode, Secondary Namenode, Job Tracker and Task Tracker on your local machine.

you can try using the JPS command to see if these services are running or not.

hduser@ubuntu:/usr/local/hadoop$ ps -ef |grep
2246 TaskTracker
1927 JobTracker
1944 DataNode
2091 SecondaryNameNode
2311 Jps
1993 NameNode


So here you are done with the Single node installation of hadoop on your local machine.



Hadoop Web Interfaces

Hadoop comes with web interfaces which by default can be seen at the following location.

Namenode: http://localhost:50070/




JobTracker: http://localhost:50030/




Task Tracker: http://localhost:50060/





Still getting Error! Try this.

Disabling IPV6

To disable IPV6 on ubuntu, open /etc/sysctl.conf and the below lines

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Note: After Disabling IPV6 you have to reboot you computer for the change effect to take place. If this doen't work out for you, there are other methods also to disable the same that you can find on net.


So these are all the step by step procedure for making a single node cluster at your home and start working on the same. Hope you find it Helpful.

Let me know if you have any doubts in understanding anything into the comment section and i will be really glad to answer your questions :)



If you like what you just read and want to continue your learning on BIGDATA you can subscribe to our Email and Like our facebook page


These might also help you :

Hadoop Tutorial: Part 3- Replica Placement or Replication and Read Operations in HDFS
Hadoop Tutorial: Part 2- Hadoop Distributed File System (HDFS)
Hadoop Tutorial: Part 1- What is Hadoop ? (an Overview)
Best of Books and Resources to Get Started with Hadoop
Hadoop Installation on Local Machine (Single node Cluster)
Hadoop Tutorial: Part 4- Write Operations in HDFS
Find Comments below or Add one

if you get error

 hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.

export HADOOP_HOME_WARN_SUPPRESS="TRUE" into hadoop-env.sh

then check


2 comments:

jagan said...

Nice One. Thank you so much

praveen said...

Thanks Jagan it helps us a lot