Hadoop pseudo distributed mode

I am just going through the steps to setup a Hadoop server in pseudo distributed mode.

I assume that you have already downloaded the Hadoop tar and untarred the package and moved it to /usr/local/hadoop

Make sure you have already setup Hadoop environment. If you missed on it, check out https://prabhugs.wordpress.com/2016/01/13/hadoop-on-ubuntu-14-04

Once the hadoop environment is ready, following the below steps

$ sudo chown -R hduser:hadoop /usr/local/hadoop

$ vi $HADOOP_HOME/etc/hadoop/core-site.xml

Change the following contents of the file,

<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>

Screenshot from 2016-01-14 10:31:07

 

<name>fs.default.name</name>
  <value>hdfs://yourIPaddress:54310</value>

Screenshot from 2016-01-14 10:31:07 (copy)

 

$ sudo mkdir -p /app/hadoop/tmp

$ sudo chown hduser:hadoop /app/hadoop/tmp

 

Make sure you have an entry of your IPaddress in your /etc/hosts file

Screenshot from 2016-01-14 10:42:26

 

Edit the hdfs-site.xml file to change the below values

$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<name>dfs.replication</name>
<value>1</value>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

Screenshot from 2016-01-14 10:39:18

 

Add these lines to the end of your .bashrc file (Remember that you are doing all these as hduser user).

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
alias jps=’/usr/lib/jvm/java-7-openjdk-amd64/bin/jps’
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”
#HADOOP VARIABLES END
export HIVE_HOME=/usr/local/hadoop/hadoop-2.6.0/hive-0.9.0-bin
export PATH=$PATH:$HIVE_HOME/bin

 

Screenshot from 2016-01-14 11:02:31

 

$ source ~/.bashrc

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

Now format the hadoop filesystem,

$ hadoop namenode -format

Upon successful formatting you should see something like below, at the end.

16/01/07 18:49:02 INFO common.Storage: Storage directory /usr/local/hadoop_store/hdfs/namenode has been successfully formatted.
16/01/07 18:49:02 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/01/07 18:49:02 INFO util.ExitUtil: Exiting with status 0

 

It’s all set now, we may start our hdfs server.

$ start-hdfs.sh

$ start-yarn.sh

Enter hduser‘s password once prompted.

 

we can recheck the Java processes running by jps command.

$ jps
28823 SecondaryNameNode
29195 Jps
28957 ResourceManager
28485 NameNode
28639 DataNode

 

Create directory in HDFS

$ hadoop fs -mkdir -p /user/hduser

You should be able to see the directory contents by using the ls command,

$ hadoop fs -ls

or

$ hadoop fs -ls hdfs://yourIPaddress:54310/user

Found 1 items
drwxr-xr-x   – hduser supergroup          0 2016-01-07 18:51 hdfs://yourIPaddress:54310/user/hduser

 

 

Problems and solutions

 

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

$ export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib/native”
$ export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

 

No such file or directory upon ls

$ hadoop fs -mkdir -p /user/hduser

 

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hduser/README.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

Solution 1:

$ stop-dfs.sh
$ stop-yarn.sh
$ sudo rm -rf /tmp/*
$ start-dfs.sh
$ start-yarn.sh

Solution 2:

$ sudo rm -r /app/hadoop/tmp
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
$ sudo chmod 750 /app/hadoop/tmp
$ start-dfs.sh

Upon running jps command Datanode should be seen.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s