Hadoop僞分佈模式,是在一個單機上模擬Hadoop分佈式環境,需要安裝的包括:
- HDFS:包括NameNode和DataNode
- Yarn:是運行mapReducede容器,包括ResourceManager和NodeManager
準備
$ sudo apt-get install ssh 【已經安裝了openssh,可以使用ssh,無需再次安裝】
$ sudo apt-get install rsync 【這個似乎也無需安裝】
無密碼的ssh登錄
$ ssh-keygen -t rsa -P ’’ -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
檢查是否成功
$ ssh localhost //如成功,就可以無密碼直接登陸
HDFS的啓動
配置NDFS
【etc/hadoop/core-site.xml】
<configuration>
<!-- 配置NDFS的NameNode -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://191.8.2.45:9000</value>
</property>
<!-- hadoop的缺省nfsd(包括namenode和datanode)位於/tmp/hadoop-<username>,本例的缺省namenode爲/tmp/hadoop-wei/dfs/name -->
<!-- 我們可以指定位置爲/home/wei/hadoop/hadoop-2.9.0/tmp,則namenode的路徑爲/home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/wei/hadoop/hadoop-2.9.0/tmp</value>
</property>
</configuration>
【etc/hadoop/hdfs-site.xml】
<configuration>
<!-- 配置冗餘度:缺省的冗餘度爲3,由於是單機版,設置爲1。 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化namenode
$ hdfs namenode -format
18/05/17 16:56:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = gsta005/191.8.2.45
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.0
STARTUP_MSG: classpath = /home/wei/hadoop/hadoop-2.9.0/etc/hadoop:... ... :/home/wei/hadoop/hadoop-2.9.0/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z
STARTUP_MSG: java = 1.8.0_66
************************************************************/
... ...
18/05/17 16:56:37 INFO common.Storage: Storage directory /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name has been successfully formatted.
18/05/17 16:56:37 INFO namenode.FSImageFormatProtobuf: Saving image file /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/05/17 16:56:37 INFO namenode.FSImageFormatProtobuf: Image file /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/05/17 16:56:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/05/17 16:56:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at gsta005/191.8.2.45
************************************************************/
啓動Namenode和Datanode的daemon
$ start-dfs.sh
Starting namenodes on [gsta005]
gsta005: starting namenode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-namenode-gsta005.out
localhost: starting datanode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-datanode-gsta005.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-secondarynamenode-gsta005.out
如果執行的過程中報錯: Error: JAVA_HOME is not set and could not be found.則在etc/hadoop/hadoop-env.sh進行設定
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/wei/jdk1.8.0_66
通過瀏覽器進行確認: http://191.8.2.45:50070/
關閉Namenode和Datanode的daemon:$ stop-dfs.shYarn的啓動
Yarn的配置
【etc/hadoop/mapred-site.xml】該文件通過cp mapred-site.xml.template mapred-site.xml創建
<configuration>
<!-- 配置MapReduce運行的框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
【etc/hadoop/yarn-site.xml】
<configuration>
<!-- 配置NodeManager執行任務的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Yarn的啓動
$ start-yarn.sh
進行驗證
$ jps
15526 DataNode
16023 ResourceManager
15383 NameNode
16552 Jps
15802 SecondaryNameNode
16172 NodeManager
訪問Yarn的頁面:http://191.8.2.45:8088