YARN部署使用
yarn僞分佈式部署,主從架構
-
切換hadoop用戶
[root@JD ~]# su - hadoop Last login: Sun Dec 1 15:09:50 CST 2019 on pts/0
-
配置mapred-site.xml
進入hadoop目錄下 [hadoop@JD ~]$ cd app/hadoop/etc/hadoop 修改文件名 [hadoop@JD hadoop]$ cp mapred-site.xml.template mapred-site.xml 配置文件 [hadoop@JD hadoop]$ vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
配置yarn-site.xml文件
web界面防火牆 38088 50070 開放這兩個端口修改yarn的端口,防止挖礦 <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>JD:38088</value> </property> </configuration>
-
啓動yarn
啓動命令 [hadoop@JD hadoop]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-resourcemanager-JD.out JD: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-nodemanager-JD.out 查看進程 [hadoop@JD hadoop]$ jps 27616 NodeManager 28004 Jps 27512 ResourceManager 查看端口 [hadoop@JD hadoop]$ netstat -nlp |grep 27512 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp6 0 0 192.168.0.3:38088 :::* LISTEN 27512/java
-
訪問路徑
JD:38088
在yarn上跑mapreduce樣例程序
-
查找mapreduce的jar所在位置
[hadoop@JD hadoop]$ cd ../../ [hadoop@JD hadoop]$ find ./ -name '*example*.jar' ./share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar ./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar ./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
-
運行wordcount程序
運行下試了試,發現需要輸入跟輸出 [hadoop@JD hadoop]$ hadoop jar ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount Usage: wordcount <in> [<in>...] <out> 創建輸入文件 [hadoop@JD hadoop]$ vi 1.log aaa bbb ccc aaa ccc ddd eee 上傳到hdfs上 啓動hdfs [hadoop@JD hadoop]$ start-dfs.sh 19/12/01 17:13:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [JD] JD: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-JD.out JD: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-JD.out Starting secondary namenodes [JD] JD: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-JD.out 19/12/01 17:13:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 級聯創建文件夾 [hadoop@JD hadoop]$ hadoop fs -mkdir -p /wordcount/input 19/12/01 17:14:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 上傳本地文件 [hadoop@JD hadoop]$ hadoop fs -put 1.log /wordcount/input 19/12/01 17:14:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@JD hadoop]$ hadoop fs -ls /wordcount/input 19/12/01 17:15:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 hadoop supergroup 29 2019-12-01 17:15 /wordcount/input/1.log 運行計算(切記輸出的文件夾不要存在,否則會報錯) [hadoop@JD hadoop]$ hadoop jar ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /wordcount/input /wordcount/output 查看結果目錄 [hadoop@JD hadoop]$ hadoop fs -ls /wordcount/output 19/12/01 17:33:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2019-12-01 17:24 /wordcount/output/_SUCCESS #標識文件,無內容 -rw-r--r-- 1 hadoop supergroup 30 2019-12-01 17:24 /wordcount/output/part-r-00000 #結果文件 查看結果文件 [hadoop@JD hadoop]$ hadoop fs -cat /wordcount/output/part-r-00000 19/12/01 17:35:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable aaa 2 bbb 1 ccc 2 ddd 1 eee 1
HDFS 存儲 計算結果又返回存儲HDFS
MR jar包 計算邏輯
Yarn 資源+作業調度
-
yarn頁面介紹
運行時狀態
完成後狀態
大數據組件
存儲 :HDFS 分佈式文件系統 Hive HBase Kudu Cassandra
計算 :MR Hivesql Spark Flink
資源+作業調度 : Yarn(1家)
以及flume和kafka等
修改機器名稱
-
centos7.x
查看當前主機名 [root@JD ~]# hostnamectl Static hostname: JD Icon name: computer-vm Chassis: vm Machine ID: 983e7d6ed0624a2499003862230af382 Boot ID: cb9adb2e30cb470b96b891712496807a Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-327.el7.x86_64 Architecture: x86-64 命令幫助查看命令 [root@JD ~]# hostnamectl --help hostnamectl [OPTIONS...] COMMAND ... Query or change system hostname. -h --help Show this help --version Show package version --no-ask-password Do not prompt for password -H --host=[USER@]HOST Operate on remote host -M --machine=CONTAINER Operate on local container --transient Only set transient hostname --static Only set static hostname --pretty Only set pretty hostname Commands: status Show current hostname settings set-hostname NAME Set system hostname set-icon-name NAME Set icon name for host set-chassis NAME Set chassis type for host set-deployment NAME Set deployment environment for host set-location NAME Set location for host 修改主機名 [root@JD ~]# hostnamectl set-hostname xxx 修改成功後,配置hosts文件 [root@JD ~]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.0.3 xxx
jps真正使用
位置在哪
所在位置
[hadoop@JD hadoop]$ which jps
/usr/java/jdk1.8.0_45/bin/jps
[hadoop@JD hadoop]$ jps --help
illegal argument: --help
usage: jps [-help]
jps [-q] [-mlvV] [<hostid>]
Definitions:
<hostid>: <hostname>[:<port>]
jps -l命令(會打印出所在包名)
[hadoop@JD hadoop]$ jps -l
27616 org.apache.hadoop.yarn.server.nodemanager.NodeManager
30048 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
29733 org.apache.hadoop.hdfs.server.namenode.NameNode
27512 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
2555 sun.tools.jps.Jps
29884 org.apache.hadoop.hdfs.server.datanode.DataNode
對應的進程標識文件在哪
切換root用戶,進入tmp目錄
[root@JD ~]# cd /tmp
構成方式hsperfdata_xxx(用戶名)
[root@JD tmp]# ll
total 20
drwxr-xr-x 2 hadoop hadoop 66 Dec 1 17:47 hsperfdata_hadoop
drwxr-xr-x 2 root root 17 Dec 1 17:47 hsperfdata_root
作用
查詢 pid 進程名稱
process information unavailable
進程所屬的用戶 去執行 jps命令,只顯示自己的相關的進程信息,root用戶 看所有的,但是顯示不可用(process information unavailable)
-
root用戶查看jps
[root@JD tmp]# jps 27616 -- process information unavailable 30048 -- process information unavailable 3842 jar 29733 -- process information unavailable 27512 -- process information unavailable 29884 -- process information unavailable 3117 Jps
-
hadoop用戶查看jps
[hadoop@JD hadoop]$ jps 27616 NodeManager 30048 SecondaryNameNode 29733 NameNode 27512 ResourceManager 3179 Jps 29884 DataNode
真假進程判斷
有可能jps查看到某一進程存在,但是使用ps -ef |grep xxx查看不存在,以ps 查看的爲準
真假判斷 :
[root@JD hsperfdata_hadoop]# ps -ef|grep 31488
root 5291 3912 0 21:32 pts/1 00:00:00 grep --color=auto 31488
[root@JD hsperfdata_hadoop]#
該進程不存在
[root@JD hsperfdata_hadoop]# ps -ef|grep 31488 | grep -v grep | wc -l
0
該進程存在
[root@JD hsperfdata_hadoop]# ps -ef|grep 2594 | grep -v grep | wc -l
1
[root@JD hsperfdata_hadoop]#
結論
jps文件不影響進程啓動和停止,但影響JPS判斷進程,建議在生產使用ps -ef|grep 進程
/tmp/hsperfdata_xxx下的進程文件是否影響進程啓動
刪除/tmp/hsperfdata_xxx的某一個進程文件,不會影響進程的啓動和停止,但是會影響jps命令查看進程,如果刪除或移動文件,jps查看進程時,不會查出該進程
Linux機制 oom-kill機制
某個進程 memory 使用過高,機器爲了保護自己,放置夯住,去殺死內存使用最多的進程;查看memory使用率,使用top命令;如果linux殺死進程,該日誌文件不會有任何記錄,應該去linux的系統日誌去查看。
所以,以後進程掛了,--》log位置--》error: 有error具體分析;沒 想到oom機制--》查看系統日誌cat /var/log/messages | grep oom
Linux機制 /tmp默認存儲週期 1個月 會自動清空不在規則以內的文件
啓動hdfs或yarn,如果不在配置文件配置的話,生成的pid文件會在/tmp目錄下,如果linux一個月自動清理pid文件,會造成程序有問題,所以我們需要在hdfs和yarn的配置文件中配置讓,pid文件不要生成在/tmp目錄下
-
hdfs的配置
配置hadoop-env.shexport HADOOP_PID_DIR=/home/hadoop/tmp
-
yarn配置
配置yarn-env.shexport YARN_PID_DIR=/home/hadoop/tmp
-
重新啓動,查看pid文件路徑
[hadoop@JD tmp]$ pwd /home/hadoop/tmp 生成在配置的路徑下 [hadoop@JD tmp]$ ll total 20 -rw-rw-r-- 1 hadoop hadoop 5 Dec 1 18:28 hadoop-hadoop-datanode.pid -rw-rw-r-- 1 hadoop hadoop 5 Dec 1 18:28 hadoop-hadoop-namenode.pid -rw-rw-r-- 1 hadoop hadoop 5 Dec 1 18:29 hadoop-hadoop-secondarynamenode.pid -rw-rw-r-- 1 hadoop hadoop 5 Dec 1 18:29 yarn-hadoop-nodemanager.pid -rw-rw-r-- 1 hadoop hadoop 5 Dec 1 18:29 yarn-hadoop-resourcemanager.pid