Hadoop V2 Single Node Installation on CentOS 6.5

Introduction

This HOWTO covers Hadoop 2.2 installation with CentOS 6.5. My series of tutorials are meant just as that – tutorials. The intent is to allow the user to gain familiarity with the application and should not be construed as any type of best practices document to be used in a production environment and as such performance, reliability and security considerations are compromised. The tutorials are freely available and may be distributed with the proper acknowledgements. Actual screenshots of the commands are used to eliminate any possibility of typographical errors, in addition long sequences of text are placed in front of the screenshots to facilitate copy and paste. Command text is printed using Courier font. In general the document will only cover the bare minimum of how to get a single node cluster up and running with the emphasis on HOW rather than WHY. For more in depth information the reader should consult the many excellent publications on Hadoop such as Tom White’s – Hadoop: The Definitive Guide, 3rd edition and Eric Sammer’s – Hadoop Operations along with the Apache Hadoop website.

Please consult www.alan-johnson.net for an online version of this document.

Prerequisites

  • CentOS 6.5 installed

Machine configuration

In this HOWTO a physical machine was used; but for educational purposes Vmware Workstation or Virtualbox (https://www.virtualbox.org/) would work just as well. The screenshot below shows acceptable VM machine settings for VMWare.

Note an additional Network Adapter and physical drive have been added. Memory allocation is 2GB which is sufficient for the tutorial.

User configuration

If installing CentOS from scratch then select a user <hadoopuser> at installation time otherwise the user can be added by the command below. In addition create a group called <hadoopgroup>.

Note the initial configuration is done as user root.

Now make hadoopuser a member of hadoopgroup.

usermod –g hadoopgroup hadoopuser

Verify by issuing the id command.ss

id hadoopuser

The next step is to give hadoopuser access to sudo commands. Do this by executing the visudo command and adding the highlighted line shown below.

Reboot and now log in as user hadoopuser.

Setting up ssh

Setup ssh for password-less authentication using keys.

ssh-keygen -t rsa -P ”

Next change file ownership and mode.

sudo chown hadoopuser ~/.ssh

sudo chmod 700 ~/.ssh

sudo chmod 600 ~/.ssh/id_rsa

Then append the public key to the file authorized_keys

sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Change permissions.

sudo chmod 600 ~/.ssh/authorized_keys

Edit /etc/ssh/sshd_config

Set PasswordAuthentication to no and allow empty passwords
as shown below in the extract of the file.

Verify that login can be accomplished without requiring a password.

Installing and configuring java

It is recommended to install the full openJDK package to take advantage of some of the java tools,

Installing openJDK

yum install java-1.7.0-openjdk*

After the installation verify the java version

java -version

The folder etc/alternatives contains a link to the java installation; perform a long listing of the file to show the link and use it as the location for JAVA_HOME.

Set the JAVA_HOME environmental variable by editing ~/.bashrc

Installing Hadoop

Downloading Hadoop

From the Hadoop releases page http://hadoop.apache.org/releases.html , download hadoop-2.2.0.tar.gz from one of the mirror sites.

Next untar the file

tar xzvf hadoop-2.2.0.tar.gz

Move the untarred folder

sudo mv hadoop-2.2.0 /usr/local/hadoop

Change the ownership with sudo chown -R hadoopuser:hadoopgroup /usr/local/hadoop

Next create namenode and datanode folders

mkdir -p ~/hadoopspace/hdfs/namenode

mkdir -p ~/hadoopspace/hdfs/datanode

Configuring Hadoop

Next edit ~/.bashrc to set up the environmental variables for Hadoop

# User specific aliases and functions

export HADOOP_INSTALL=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export PATH=$PATH:$HADOOP_INSTALL/sbin
export PATH=$PATH:$HADOOP_INSTALL/bin

Now apply the variables.

There are a number of xml files within the Hadoop folder that require editing which are:

  • mapred-site.xml
  • yarn-site.xml
  • core-site.xml
  • hdfs-site.xml
  • hadoop-env.sh

The files can be found in /usr/local/hadoop/etc/hadoop/. First copy the mapred-site template file over and then edit it.

mapred-site.xml

Add the following text between the configuration tabs.

<property>
  <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>

yarn-site.xml

Add the following text between the configuration tabs.

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

core-site.xml

Add the following text between the configuration tabs.
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>

hdfs-site.xml

Add the following text between the configuration tabs.

<property>
 <name>dfs.replication</name>
<value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>file:///home/hadoopuser/hadoopspace/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>file:///home/hadoopuser/hadoopspace/hdfs/datanode</value>
</property>

Note other locations can be used in hdfs by separating values with a comma, e.g.

file:/home/hadoopuser/hadoopspace/hdfs/datanode, .disk2/Hadoop/datanode, . .

hadoop-env.sh

Add an entry for JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/

Next format the namenode.

. . .

Issue the following commands.

start-dfs.sh
start-yarn.sh

Issue the jps command and verify that the following jobs are running:

At this point Hadoop has been installed and configured

Testing the installation

A number of test files exist that can be used to benchmark Hadoop. Entering the command below without any arguments will list available tests.

The TestDFSIO test below can be used to measure read performance – initially create the files and then read:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 100

The results are logged in TestDFSIO_results.log which will show throughput rates:

During the test run a message will be printed with a tracking url such as that shown below:

The link can be selected or the address can be pasted into a browser.

Another test is mrbench which is a map/reduce test.

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar mrbench –maps 100

Finally the test below is used to calculate pi. The first parameter refers to the number of maps and the second is the number of samples for each map.

hadoop jar $HADOOP_INSTALL/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20

. . .

Note accuracy can be improved by increasing the value of the second parameter.

Working from the command line

Invoking a command without any or insufficient parameters will generally print out help data”

hdfs commands

hdfs dfsadmin –help

. . .

hadoop commands

hadoop version

Web Access

The location for checking the Namenode status is at localhost:50070/. This web page contains status information relating to the cluster.

There are also links for browsing the filesystem.

Logs can also be examined from the NameNode Logs link.

. . .

The secondary namenode can be accessed using port 50090

On line documentation

Comprehensive documentation can be found at the Apache website or locally using a browser by pointing it at $HADOOP_INSTALL/share/doc/Hadoop/index.html/

Feedback, corrections and suggestions are welcome, as are suggestions for further HOWTOs.

12 thoughts on “Hadoop V2 Single Node Installation on CentOS 6.5

  1. I got all the way to the first test case, then this happened:

    [hadoopuser@CentOS hadoop]$ jps
    5661 SecondaryNameNode
    5797 ResourceManager
    5482 DataNode
    5886 NodeManager
    5396 NameNode
    5923 Jps
    [hadoopuser@CentOS hadoop]$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
    14/04/13 07:40:37 INFO fs.TestDFSIO: TestDFSIO.1.7
    14/04/13 07:40:37 INFO fs.TestDFSIO: nrFiles = 10
    14/04/13 07:40:37 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
    14/04/13 07:40:37 INFO fs.TestDFSIO: bufferSize = 1000000
    14/04/13 07:40:37 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
    14/04/13 07:40:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    14/04/13 07:40:46 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files
    14/04/13 07:40:50 INFO fs.TestDFSIO: created control files for: 10 files
    14/04/13 07:40:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/04/13 07:40:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

    14/04/13 07:56:01 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoopuser (auth:SIMPLE) cause:java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: “CentOS.Hadoop/67.215.65.132”; destination host is: “0.0.0.0”:8032;
    14/04/13 07:56:01 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoopuser (auth:SIMPLE) cause:java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: “CentOS.Hadoop/67.215.65.132”; destination host is: “0.0.0.0”:8032;
    java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: “CentOS.Hadoop/67.215.65.132”; destination host is: “0.0.0.0”:8032;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy10.getNewApplication(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:167)
    at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy11.getNewApplication(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:127)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:135)
    at org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:175)
    at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:229)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:355)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
    at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443)
    at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425)
    at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115)
    at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
    at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
    at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
    at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
    at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
    at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
    at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

    • Sorry Bill, that was trapped in my Spam filter – sometimes java can give a lot of misleading messages, also there are some security messages – can you check permissions on files and also the java version. Everything should work as the screenshots were all taken from a running system. Are you short on resources – perhaps try with a smaller file size? Also double check the configuration files. Are you using the same Hadoop versions etc?

      Now that it is posted, some of the other readers may have ran into a similar situation as well and can comment but I have not encountered this. Also have a look at your environment variables. Please let me know how you get on.

  2. Yes, I can check permissions on the files, if I knew what files and permissions to check and what the permissions should be. I can also check the configuration files if I knew wnat I should be looking for.

    I can also have a look at my environment variables.

    • Bill, I did some googling on this and even though this issue seems have come up for others I could not see a definitive response. Can you also try to reformat the namenode again and verify that it completes. I will also do some more digging around.

  3. II get below error “No such file or directory” with every hadoop command issued and unable to find a solution for it.. I followed exactly the same way you mentioned in the blog. Request if you can share some trick to fix this.

    [hadoopuser@localhost datanode]$ jps
    20668 NodeManager
    23252 Jps
    20540 ResourceManager
    20342 SecondaryNameNode
    19934 NameNode

    [hadoopuser@localhost datanode]$ hadoop fs -ls

    14/09/16 15:10:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    ls: `.’: No such file or directory
    [hadoopuser@localhost datanode]$

  4. Hi Anil

    are you are trying to install Hadoop on 64 bit Linux O/S. Because you binary not compatible with 64 bit O/S in other word the library is 32-bit running on 64-bit server. that why you getting error.

    Warnning:– WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

    you need to fix warning by recompile hadoop source on 64 bit O/S that resolve your warning message.
    I will post the step on my blog tomorrow. and share the link here.

  5. Hey Laxman,

    Thanks for the prompt response.. The issue of 32bit & 64bit can be ignore since its just a warning. The main issue is hadoop fs commands are not getting executed.

    For normal hadoop fs -ls command i get below exception
    ls: `.’: No such file or directory

    and same exception occurs if i try to create a folder, copy a file.

    Thanks and waiting for your advice.

    Rgds, Anil

  6. Hey .. nice tutorial but i am getting the following error while following the process ::

    bash: start-dfs.sh: command not found

    could you please let me what i am doing wronf

    • Can you check the PATH variable and also look at ~/.bashrc as described. – you should be able to locate this script within the path.

  7. Pingback: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.Namenode | FYTRO SPORTS

Comments and suggestions for future articles welcome!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s