hadoop歷史服務器

可以通過歷史服務器查看已經運行完的Mapreduce作業記錄,比如用了多少個Map、用了多少個Reduce、作業提交時間、作業啓動時間、作業完成時間等信息。

默認情況下,hadoop歷史服務器是沒有啓動的,我們可以通過下面的命令來啓動hadoop歷史服務器

$ sbin/mr-jobhistory-daemon.sh start historyserver

在相應機器的19888端口上就可以打開歷史服務器的WEB UI界面。

歷史服務器可以單獨在一臺機器上啓動,主要是通過以下的參數配置:
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>0.0.0.0:10020</value>
</property>
參數解釋:MapReduce JobHistory Server地址。

<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>0.0.0.0:19888</value>
</property>
參數解釋:MapReduce JobHistory Server Web UI地址。

在mapred-site.xml文件中進行配置,配置完上述的參數之後,重新啓動Hadoop jobhistory,這樣我們就可以在mapreduce.jobhistory.webapp.address參數配置的機器上對Hadoop歷史作業情況進行查看。 

這些歷史數據存放在HDFS中,可以通過下面的配置來設置在HDFS的什麼目錄下存放歷史作業記錄:

<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
</property>

<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
</property>

<property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/tmp/hadoop-yarn/staging</value>
</property>

上面的配置都是默認的值,在mapred-site.xml文件中進行修改。

mapreduce.jobhistory.done-dir:在什麼目錄下存放已經運行完的Hadoop作業記錄;

mapreduce.jobhistory.intermediate-done-dir:正在運行的Hadoop作業記錄。


查看運行完的Hadoop作業:

[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/
Found 3 items
drwxrwxrwx   - sparkadmin supergroup          0 2015-12-01 00:02 /tmp/hadoop-yarn/staging/history/done/2015
drwxrwx---   - sparkadmin supergroup          0 2016-12-01 00:07 /tmp/hadoop-yarn/staging/history/done/2016
drwxrwx---   - sparkadmin supergroup          0 2017-01-01 00:07 /tmp/hadoop-yarn/staging/history/done/2017


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2016
Found 12 items
drwxrwx---   - sparkadmin supergroup          0 2016-01-31 00:02 /tmp/hadoop-yarn/staging/history/done/2016/01
drwxrwx---   - sparkadmin supergroup          0 2016-02-29 00:02 /tmp/hadoop-yarn/staging/history/done/2016/02
drwxrwx---   - sparkadmin supergroup          0 2016-03-31 00:03 /tmp/hadoop-yarn/staging/history/done/2016/03
drwxrwx---   - sparkadmin supergroup          0 2016-04-30 00:02 /tmp/hadoop-yarn/staging/history/done/2016/04
drwxrwx---   - sparkadmin supergroup          0 2016-05-31 00:02 /tmp/hadoop-yarn/staging/history/done/2016/05
drwxrwx---   - sparkadmin supergroup          0 2016-06-30 00:02 /tmp/hadoop-yarn/staging/history/done/2016/06
drwxrwx---   - sparkadmin supergroup          0 2016-07-31 00:00 /tmp/hadoop-yarn/staging/history/done/2016/07
drwxrwx---   - sparkadmin supergroup          0 2016-08-31 00:00 /tmp/hadoop-yarn/staging/history/done/2016/08
drwxrwx---   - sparkadmin supergroup          0 2016-09-30 00:00 /tmp/hadoop-yarn/staging/history/done/2016/09
drwxrwx---   - sparkadmin supergroup          0 2016-10-31 00:06 /tmp/hadoop-yarn/staging/history/done/2016/10
drwxrwx---   - sparkadmin supergroup          0 2016-11-30 00:07 /tmp/hadoop-yarn/staging/history/done/2016/11
drwxrwx---   - sparkadmin supergroup          0 2016-12-31 00:07 /tmp/hadoop-yarn/staging/history/done/2016/12


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2016/12
17/01/09 10:05:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 31 items
drwxrwx---   - sparkadmin supergroup          0 2016-12-09 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/01
drwxrwx---   - sparkadmin supergroup          0 2016-12-10 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/02
drwxrwx---   - sparkadmin supergroup          0 2016-12-10 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/03
drwxrwx---   - sparkadmin supergroup          0 2016-12-11 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/04
drwxrwx---   - sparkadmin supergroup          0 2016-12-13 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/05
drwxrwx---   - sparkadmin supergroup          0 2016-12-13 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/06
drwxrwx---   - sparkadmin supergroup          0 2016-12-15 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/07
drwxrwx---   - sparkadmin supergroup          0 2016-12-16 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/08
drwxrwx---   - sparkadmin supergroup          0 2016-12-17 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/09
drwxrwx---   - sparkadmin supergroup          0 2016-12-18 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/10
drwxrwx---   - sparkadmin supergroup          0 2016-12-19 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/11
drwxrwx---   - sparkadmin supergroup          0 2016-12-20 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/12
drwxrwx---   - sparkadmin supergroup          0 2016-12-21 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/13
drwxrwx---   - sparkadmin supergroup          0 2016-12-22 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/14
drwxrwx---   - sparkadmin supergroup          0 2016-12-23 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/15
drwxrwx---   - sparkadmin supergroup          0 2016-12-24 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/16
drwxrwx---   - sparkadmin supergroup          0 2016-12-25 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/17
drwxrwx---   - sparkadmin supergroup          0 2016-12-26 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/18
drwxrwx---   - sparkadmin supergroup          0 2016-12-27 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/19
drwxrwx---   - sparkadmin supergroup          0 2016-12-28 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/20
drwxrwx---   - sparkadmin supergroup          0 2016-12-29 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/21
drwxrwx---   - sparkadmin supergroup          0 2016-12-30 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/22
drwxrwx---   - sparkadmin supergroup          0 2016-12-31 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/23
drwxrwx---   - sparkadmin supergroup          0 2017-01-01 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/24
drwxrwx---   - sparkadmin supergroup          0 2017-01-02 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/25
drwxrwx---   - sparkadmin supergroup          0 2017-01-03 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/26
drwxrwx---   - sparkadmin supergroup          0 2017-01-04 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/27
drwxrwx---   - sparkadmin supergroup          0 2017-01-05 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/28
drwxrwx---   - sparkadmin supergroup          0 2017-01-06 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/29
drwxrwx---   - sparkadmin supergroup          0 2017-01-07 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/30
drwxrwx---   - sparkadmin supergroup          0 2017-01-08 10:17 /tmp/hadoop-yarn/staging/history/done/2016/12/31


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2017/01/09
Found 2 items
drwxrwx---   - sparkadmin supergroup          0 2017-01-09 01:04 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041
drwxrwx---   - sparkadmin supergroup          0 2017-01-09 08:10 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000042


[sparkadmin@spark5 ~]$ hadoop fs -ls /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041
-rwxrwx---   3 sparkadmin supergroup      21171 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41871-1483891560801-sparkadmin-QueryResult.jar-1483891570304-1-0-SUCCEEDED-default-1483891564771.jhist
-rwxrwx---   3 sparkadmin supergroup     119729 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41871_conf.xml
-rwxrwx---   3 sparkadmin supergroup      21074 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41872-1483891560520-sparkadmin-QueryResult.jar-1483891569932-1-0-SUCCEEDED-default-1483891564971.jhist
-rwxrwx---   3 sparkadmin supergroup     119533 2017-01-09 00:06 /tmp/hadoop-yarn/staging/history/done/2017/01/09/000041/job_1480040162446_41872_conf.xml

 
由於歷史作業記錄非常多,所以歷史作業記錄是按照 年/月/日的形式分別存放在相應的目錄中,這樣便於管理和查找; 
對於每一個Hadoop歷史作業記錄相關信息都用兩個文件存放,後綴名分別爲*.jhist,*.xml。

*.jhist文件裏存放的是具體Hadoop作業的詳細信息;*.xml文件裏面記錄的是相應作業運行時的完整參數配置


*.jhist文件裏存放的是Hadoop job初始化的信息,*.jhist文件裏面全部都是Json格式的數據。根據type進行區分這條Json的含義

{
   "type": "JOB_INITED",
   "event": {
      "org.apache.hadoop.mapreduce.jobhistory.JobInited": {
         "jobid": "job_1388830974669_1215999",
         "launchTime": 1392477383583,
         "totalMaps": 1,
         "totalReduces": 1,
         "jobStatus": "INITED",
         "uberized": false
      }
   }
}


如果對Hadoop歷史服務器WEB UI上提供的數據不滿意,我們可以通過對mapreduce.jobhistory.done-dir配置的目錄進行分析,得到我們感興趣的信息。

比如統計某天中運行了多少個map、運行最長的作業用了多少時間、每個用戶運行的Mapreduce任務數、總共運行了多少Mapreduce數等信息,這樣對監控Hadoop集羣是很好的,我們可以根據這些信息來確定給某個用戶分配資源等等。

在Hadoop歷史服務器的WEB UI上最多顯示20000個歷史的作業記錄信息;其實我們可以通過下面的參數進行配置,然後重啓一下Hadoop jobhistory即可。
<property>
    <name>mapreduce.jobhistory.joblist.cache.size</name>
    <value>20000</value>
</property>


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章