Running Cloudera in Distributed Mode

Running Cloudera in Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode(multi node cluster)Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).Deploy Cloudera (CDH3) on Cluster:

COMMAND DESCRIPTION
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done Before starting Cloudera in distributed mode first stop each cluster
update-alternatives –display hadoop-0.20-conf To list alternative Hadoop configurations on Your system
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster Copy the default configuration to your custom directory
update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 To activate the new configuration on Your systems
update-alternatives –display hadoop-0.20-conf To Check the new configuration on Your systems
or
update-alternatives –set hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster
To manually set the configuration
vi /etc/hosts Then type
IP-add master(eg: 192.168.0.1 master)
IP-add slave(eg: 192.168.0.2 slave)
sudo apt-get install openssh-server openssh-client install ssh
ssh-keygen -t rsa -P "" generating rsa key for passwordless ssh
ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave setting passwordless ssh
Now go to your custom directory (conf.cluster) and change configuration files
vi masters
then erase old contents and type master
masters file defines the namenodes of our multi-node cluster
vi slaves
then erase old contents and type slave
slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will be run.
vi core-site.xml
then type:
<property>
  <name>fs.default.name</name>

  <value>hdfs://master:54310</value>

 </property>
Edit configuration file core-site.xml
vi mapred-site.xml
then type:
<property>
  <name>mapred.job.tracker</name>

  <value>master:54311</value>

 </property>
Edit configuration file mapred-site.xml
vi hdfs-site.xml
then type:

<property>
  <name>dfs.replication</name>

  <value>1</value>
  </property>

Edit configuration file hdfs-site.xml

(value=number of slaves)

Now copy /etc/hadoop-0.20/conf.cluster directory to all nodes in your cluster
update-alternatives –install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 Set alternative rules on all nodes to activate your configuration.
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done

Restart the daemons on all nodes in your cluster using the service scripts so that the new configuration files are read and then stop them
su -s /bin/bash – hdfs -c 'hadoop namenode -format' Format namenode manually(Before starting namenode)
You must run the commands on the correct server, according to your role definition
/etc/init.d/hadoop-0.20-namenode start
 /etc/init.d/hadoop-0.20-secondarynamenode start
/etc/init.d/hadoop-0.20-jobtracker start

To start the daemons on namenode
on master
/etc/init.d/hadoop-0.20-datanode start
/etc/init.d/hadoop-0.20-tasktracker start
To start the daemons on datanode

on slave

Congratulations Cloudera CDH setup is completed
此条目发表在未分类分类目录,贴了, 标签。将固定链接加入收藏夹。