The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
This article will help you for step by step install and configure single node hadoop cluster using Hadoop 1.2.1.
Step 1. Install Java
#java -version java version "1.7.0_75 " Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
Step 2. Create User Account
Now create a system user account to use for hadoop installation. We prefer to use name as “hadoop”
$sudo useradd hadoop $sudo passwd hadoop
Step 3. Configuring Key Based Login
It’s required to setup hadoop user to ssh itself without password. Using following method it will enable key based login for hadoop user.
$sudo su - hadoop $ ssh-keygen -t rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys $ exit
Step 4. Download and Extract Hadoop Source
download hadoop latest available version from its official site, and follow below steps.
$sudo mkdir /opt/hadoop $sudo cd /opt/hadoop/ $sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz $sudo tar -xzf hadoop-1.2.1.tar.gz $sudo mv hadoop-1.2.1 hadoop $sudo chown -R hadoop /opt/hadoop $sudo cd /opt/hadoop/hadoop/
Step 5: Configure Hadoop
First edit hadoop configuration files and make following changes.
5.1 Edit core-site.xml
5.1 Edit core-site.xml
$sudo vim conf/core-site.xml
#Add the following inside the configuration tag fs.default.name hdfs://localhost:9000/ dfs.permissions false
5.2 Edit hdfs-site.xml
$sudo vim conf/hdfs-site.xml
# Add the following inside the configuration tag dfs.data.dir /opt/hadoop/hadoop/dfs/name/data true dfs.name.dir /opt/hadoop/hadoop/dfs/name true dfs.replication 2
5.3 Edit mapred-site.xml
$sudo vim conf/mapred-site.xml
# Add the following inside the configuration tag mapred.job.tracker localhost:9001
5.4 Edit hadoop-env.sh
$sudo vim conf/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_75 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Set JAVA_HOME path as per your system configuration for java.
Next to format Name Node
$sudo su - hadoop $sudo cd /opt/hadoop/hadoop $ bin/hadoop namenode -format
13/06/02 22:53:48 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = srv1.tecadmin.net/192.168.1.90 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_75 ************************************************************/ 13/06/02 22:53:48 INFO util.GSet: Computing capacity for map BlocksMap 13/06/02 22:53:48 INFO util.GSet: VM type = 32-bit 13/06/02 22:53:48 INFO util.GSet: 2.0% max memory = 1013645312 13/06/02 22:53:48 INFO util.GSet: capacity = 2^22 = 4194304 entries 13/06/02 22:53:48 INFO util.GSet: recommended=4194304, actual=4194304 13/06/02 22:53:49 INFO namenode.FSNamesystem: fsOwner=hadoop 13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup 13/06/02 22:53:49 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/06/02 22:53:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/06/02 22:53:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/06/02 22:53:49 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 13/06/02 22:53:49 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/06/02 22:53:49 INFO common.Storage: Image file of size 112 saved in 0 seconds. 13/06/02 22:53:49 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 13/06/02 22:53:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits 13/06/02 22:53:49 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 13/06/02 22:53:49 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90 ************************************************************/
Step 6: Start Hadoop Services
Use the following command to start all hadoop services.
$ bin/start-all.sh
[sample output]
starting namenode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-ns1.tecadmin.net.out localhost: starting datanode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-ns1.tecadmin.net.out localhost: starting secondarynamenode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-ns1 .tecadmin.net.out starting jobtracker, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-ns1.tecadmin.net.out localhost: starting tasktracker, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-ns1.tecadmin.ne t.out
Step 7: Test and Access Hadoop Services
Use ‘jps ‘ command to check if all services are started well.
$ jps or $ $JAVA_HOME/bin/jps
26049 SecondaryNameNode 25929 DataNode 26399 Jps 26129 JobTracker 26249 TaskTracker 25807 NameNode
Web Access URLs for Services
http://srv1.tecadmin.net:50030/ for the Jobtracker http://srv1.tecadmin.net:50070/ for the Namenode http://srv1.tecadmin.net:50060/ for the Tasktracker
Step 8: Stop Hadoop Services
If you do no need anymore hadoop. Stop all hadoop services using following command.
# bin/stop-all.sh
No comments:
Post a Comment
Dharamart.blogspot.in