Installing Hadoop cluster on Linux

I am working on creating a Big Data platform in our lab. I managed to install hadoop 2.5.0 with help of these two guides on Ubuntu 14.04 LTS with Oracle JDK 7 (java version 1.7.0_65)

After successfully deploying on single computer I moved on

I ran into some problem where one of the datanodes failed to start due to following exception Incompatible clusterIDs in /app/hadoop/tmp/dfs/data

The fix was similar to

I followed  the manual fix where I edited clusterID in/app/hadoop/tmp/dfs/data/current/VERSION to match  /app/hadoop/tmp/dfs/name/current/VERSION.

After finishing the setup, the datanodes failed to find the namenode. The fix for the issue is given at:

After the Hadoop was up with all the datanodes, I moved to Yarn. However the nodes failed to connect to the manager
Retrying connect to server:

I had to modify yarn-site.xml according to answer here:

After this, I moved to running the example from An extra step is needed before copying the files
hdfs dfs -mkdir -p /user/hduser/gutenberg
The example itself can be run as
hduser@node0:/usr/local/hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.5.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output

Leave a Reply

Your email address will not be published.