Installing Hadoop cluster on Linux

I am working on creating a Big Data platform in our lab. I managed to install hadoop 2.5.0 with help of these two guides on Ubuntu 14.04 LTS with Oracle JDK 7 (java version 1.7.0_65)

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

http://askubuntu.com/questions/144433/how-to-install-hadoop

After successfully deploying on single computer I moved on

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

I ran into some problem where one of the datanodes failed to start due to following exception

java.io.IOException: Incompatible clusterIDs in /app/hadoop/tmp/dfs/data

The fix was similar to
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#javaioioexception-incompatible-namespaceids

I followed  the manual fix where I edited clusterID in/app/hadoop/tmp/dfs/data/current/VERSION to match  /app/hadoop/tmp/dfs/name/current/VERSION.

After finishing the setup, the datanodes failed to find the namenode. The fix for the issue is given at:
http://stackoverflow.com/questions/8872807/hadoop-datanodes-cannot-find-namenode

After the Hadoop was up with all the datanodes, I moved to Yarn. However the nodes failed to connect to the manager
Retrying connect to server: 0.0.0.0/0.0.0.0:8031

I had to modify yarn-site.xml according to answer here:
http://stackoverflow.com/questions/21840771/simple-yarn-benchmark-testdfsio-fails

After this, I moved to running the example from michael-noll.com. An extra step is needed before copying the files
hdfs dfs -mkdir -p /user/hduser/gutenberg
The example itself can be run as
hduser@node0:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output

Leave a Reply

Your email address will not be published.