Hadoop on Ubuntu (14.04)

We will go through the required steps for setting up a single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu(14.04) Linux. It provides high throughput access to application data and is suitable for applications that have large data sets.

Steps:

  1. Create a dedicated user for hadoop
  2. Java should be installed
  3. Setup ssh and generate key
  4. Set environment variables
  5. Configure Java alternatives
  6. Download Hadoop
  7. Setup and configure Hadoop environment
  8. Verify and run Hadoop

 

Create a user for Hadoop


$ sudo useradd hduser
$ sudo passwd hduser
$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser

 

Install Java

Java is the main prerequisite for Hadoop. First of all, you should verify the existence of java in your system using the command “java -version”.

$ java -version

If Java is working as expected, you should see something similar to,

java version “1.7.0_79”
OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

 

Setup ssh and generate key

The following commands are used for generating a key value pair using SSH. Copy the public keys form id_rsa.pub to authorized_keys, and provide the owner with read and write permissions to authorized_keys file respectively.
$ su – hduser
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

 

Set environment variables

For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin

Apply all the changes
$ source ~/.bashrc

 

Configure Java alternatives

# alternatives –install /usr/bin/java java usr/local/java/bin/java 2
# alternatives –install /usr/bin/javac javac usr/local/java/bin/javac 2
# alternatives –install /usr/bin/jar jar usr/local/java/bin/jar 2
# alternatives –set java usr/local/java/bin/java
# alternatives –set javac usr/local/java/bin/javac
# alternatives –set jar usr/local/java/bin/jar

 

Download Hadoop

Download Hadoop from http://hadoop.apache.org/releases.html
$ su password:
# cd /usr/local
# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/ hadoop-2.4.1.tar.gz
# tar xzf hadoop-2.4.1.tar.gz # mv hadoop-2.4.1/* to hadoop/
# exit

 

Setup and configure Hadoop environment

Appending the following command to ~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoop

Make sure Hadoop is working fine,

$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0 From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar

 

 

One thought on “Hadoop on Ubuntu (14.04)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s