how-to-use-grahite-and-grafana-to-monitor-spark

how-to-use-grahite-and-grafana-to-monitor-spark


寫作緣由:最近看到spark社區有人使用 graphite-grafana監控spark,
spark-developers-list Monitoring Spark with Graphite and Grafana
hammerlab Monitoring Spark with Graphite and Grafana
github grafana-spark-dashboards

因爲工作中涉及spark監控,故測試一下,記錄如下:

首先,看看graphite+grafana的監控效果:

  • graphite-web展示的效果:

自己測試的web展示,對metric的graph 定製操作還不太熟悉,比較原始不美觀,這裏先看看Graphite 簡介裏的兩張圖片。
Graphite brower web interface
Graphite brower web interface
Graphite CLI web interface
Graphite CLI web interface

  • grafana-web的監控 spark 的展示效果:感覺比 ganglia 的效果好一些
    自己測試的web展示,對metric的graph 定製操作還不太熟悉,比較原始不美觀,這裏先看看hammerlab一篇博文Monitoring Spark with Graphite and Grafana裏的幾張圖片。
    Task Completion Rate
    Task Completion Rate
    HDFS I/O
    HDFS I/O
    JVM statistics exported by Spark
    JVM statistics exported by Spark

1 graphite 簡介

1)是什麼?發展情況

Graphite is an enterprise-scale monitoring tool that runs well on
cheap hardware. It was originally designed and written by Chris Davis
at Orbitz in 2006 as side project that ultimately grew to be a
foundational monitoring tool. In 2008, Orbitz allowed Graphite to be
released under the open source Apache 2.0 license. Since then Chris
has continued to work on Graphite and has deployed it at other
companies including Sears, where it serves as a pillar of the
e-commerce monitoring system. Today many large companies use it.

Graphite是一個企業級的監控工具,可以在廉價機硬件上運行。 最初由Chris Davis在 Orbitz
工作時,作爲一個輔助項目在2006年使用Python語言編寫,最終成一個基本的監控工具。 在2008,Orbitz
允許軟件以開源Apache 2.0 license的授權方式發行。 從那之後 Chris繼續開發這個軟件,並將其部署在其它公司,包括
Sears, 使得Graphite成爲電子商務監控系統的一個支柱。今天有很多大公司使用它。

閒話Orbitz

Orbitz Worldwide公司(Nasdaq:OWW)是全球領先的在線旅遊公司,採用創新的技術幫助休閒和商務旅行者研究、計劃和預訂全面的旅行產品
Orbitz成立於1999年,當時,美國的旅遊業發展迅速,Orbitz也就趁勢而入。Orbitz的航空公司投資者包括全美航空,三角洲航空,西北航空公司和聯合航空公司。Orbitz網站成立於2001年,在2004年11月的時候被Cendant收購,成爲了Cendant的國內在線旅遊服務部門,於2007年上市。
20150213,
藝龍(納斯達克股票代碼:LONG)網的大股東——在線旅遊服務網站Expedia將以每股12美元的價格(16億美元)收購另一家美國在線旅遊網站Orbitz,以此向美國最大在線旅遊服務網站Priceline發起挑戰。

2)功能,能做什麼?

What Graphite is and is not Graphite does two things:

Store numeric time-series data Render graphs of this data on demand
What Graphite does not do is collect data for you, however there are
some tools out there that know how to send data to graphite. Even
though it often requires a little code, sending data to Graphite is
very simple.

Graphite 功能 Graphite是一個畫圖工具?,將數據以圖形的方式展現出來。它主要做兩件事:
- 存儲時間序列數據
- 根據需要呈現數據的圖形
Graphite不收集數據,有一些工具知道如何發送數據給Graphite

3)架構
graphite 架構圖1
graphite 架構圖1

graphite架構圖2
graphite 架構圖2

4)
參考資料
#graphite
英文
graphite wikidot/website
graphite high level diagram
文章中有架構圖
Graphite wiki how-to

graphite docs Overview
graphite docs FQA
最後有一張架構圖
Questions forum on Launchpad

graphite website installation
graphite docs installation

[10 Things I Learned Deploying Graphite — Kevin McCarthy]https://kevinmccarthy.org/blog/2013/07/18/10-things-i-learned-deploying-graphite/

中文
Graphite 簡介
該文部分是對 [graphite Overview] 的翻譯
Graphite 安裝和常見問題
該文部分是對 [graphite docs installation] 的翻譯

Carbon FAQ – Graphite 中文版
Graphite Url Api 教程
Graphite dashboard 使用指南
Graphite CLI 教程


2 爲什麼關注 graphite ?

spark 監控的技術有哪些?

  • 1) ganglia

組件 gmond //client端, c編寫 gmetad //server端, c編寫 rrd //存儲
gweb //php編寫,模板框架 httpd/nginx + phpd //提供 web服務

memcached //gmetad 寫 io 優化

優點:安裝部署簡單,spark metricsSystem 支持,能夠進行常用指標的實時與歷史信息
不足:不支持告警,graph控制不夠靈活

  • 2) nagios
    組件和優點/不足,待整理補充

  • 3) ambari(ganglia + nagios + puppet + hadoop)

組件
ambari-agent
ambari-server
ambari-web

優點與不足:待整理補充

  • 4) graphite + grafana
    架構:

Graphite consists of 3 software components: carbon - a Twisted daemon
that listens for time-series data whisper - a simple database library
for storing time-series data (similar in design to RRD) graphite
webapp - A Django webapp that renders graphs on-demand using Cairo

Graphite由三個軟件組件組成: carbon - 一個Twisted守護進程,監聽時間序列數據 whisper –
一個簡單的數據庫庫,用來存儲時間序列數據,在設計上類似於RRD graphite webapp – Django webapp,使用
Cairo來根據需要呈現圖形

閒話Twisted

Twisted is an event-driven networking engine written in Python and
licensed under the open source
我看到的最棒的Twisted入門教程!

Grafana

Graphite-web 自帶的界面不夠美觀,這裏使用 grafana ,

Grafana is a general purpose dashboard and graph composer. It’s
focused on providing rich ways to visualize time series metrics,
mainly though graphs but supports other ways to visualize data through
a pluggable panel architecture. It currently has rich support for for
Graphite, InfluxDB and OpenTSDB. But supports other data sources via
plugins.
grafana是一種通用的儀表板和圖形的專業生成軟件。它專注於提供豐富的方式來可視化時間序列指標,主要通過圖形,也支持通過一個可插拔的面板結構提供其他的方式可視化數據。它目前對
Graphite, InfluxDB and OpenTSDB 數據源支持非常豐富。也可以通過插件支持其他數據源。

部署測試graphite+grafana監控spark要安裝的相關軟件

python+easy_install+pip
Django
carbon //server端, 接收 metrics, python編寫
whisper //server端, 存儲 metrics, python編寫
graphite-web // web前端展示
httpd/nginx
grafana //web 前端展示

memcached //io優化

hammerlab/grafana-spark-dashboards

優點:待整理
結合grafana 展示的圖表確實很豔麗(參考上面的圖表)
不足:
涉及技術多
測試過程,部署複雜;使用不夠簡單

相關技術:
Tools That Work With Graphite

如何開發graphite-web
Working on Graphite-web

官方列舉的使用公司
Who is using Graphite?


3 怎麼用 graphite 監控 spark

測試拓撲

3個節點 主機名:spark1/spark2/spark3
部署 hadoop/spark 集羣(略)

1個節點 主機名:monitor1
部署 python + esay_install + pip + graphite(carbon+whisper+graphite-web) + httpd + grafana + hammerlab/grafana-spark-dashboards

1 安裝

參考鏈接
graphite website installation
graphite docs installation

Graphite 安裝和常見問題
該文部分是對 [graphite docs installation] 的翻譯

參考 how-to-install-graphite-on-centos6.6-x86_64
參考 how-to-install-grafana-on-centos6.6-x86_64

2 配置 spark 向 graphite 發 metric 數據

vi conf/metrics.properites

# org.apache.spark.metrics.sink.GraphiteSink
#   Name:     Default:      Description:
#   host      NONE          Hostname of Graphite server
#   port      NONE          Port of Graphite server
#   period    10            Poll period
#   unit      seconds       Units of poll period
#   prefix    EMPTY STRING  Prefix to prepend to metric name
#   protocol  tcp           Protocol ("tcp" or "udp") to use

*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
#master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
#worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
#driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
#executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

#Enable GraphiteSink
*.sink.Graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.Graphite.host=monitor1
*.sink.Graphite.port=2003

spark standalone需要分發metrics.properties

scp metrics.properties spark2:~/app/spark/conf/
scp metrics.properties spark3:~/app/spark/conf/

3 graphite-web 查看 spark metrics graph

web訪問: http://monitor1:61080
更新20150320:
這裏看看 graphite對metric的定製功能:
graphite-web默認以樹形結構組織metrics,然後選擇要查看的metric,就可以展示
graphite-web默認以樹形結構組織metrics,然後選擇要查看的metric,就可以展示

graphite也支持自定義dashboard,選擇需要展示的一些metrics,進行組合操作,訪問dashboard可以展示定製的任何metrics graph
graphite也支持自定義dashboard,選擇需要展示的一些metrics,進行組合操作,訪問dashboard可以展示定製的任何metrics graph


4 grafana 查看 spark metrics graph

brower訪問:http://monitor1:61081
更新:20150320
首頁顯示如下:
grafana-web首頁
grafana-web首頁

在自定義的dashboard上創建自定義graph,可以選擇任意想要展示的metrics
在自定義的dashboard上創建自定義graph,可以選擇任意想要展示的metrics

自己創建的dashboard,展示了 load, mem, jvm_heap相關的指標
自己創建的dashboard,展示了 load, mem, jvm_heap相關的指標

grafana常見問題:
問題1:grafana web 添加 graphic 不能選擇 metric,導致不能完成定製
grafana-web日誌沒有相關錯誤

原因:
參考Troubleshooting
查看首頁,發現 grafana的 graph panel 提示 connected
使用 firefox ->tools->web developers->toggle tools(option+commond+I)
發現提示 Cross-Origin Request Blocked

處理方法1:
CORS on Apache

To add the CORS authorization to the header using Apache, simply add the following line inside either the , , or sections of your server config (usually located in a *.conf file, such as httpd.conf or apache.conf), or within a .htaccess file:

Header set Access-Control-Allow-Origin “*”

vi /etc/httpd/conf.d/graphite-vhost.conf
在 部分,添加

Header set Access-Control-Allow-Origin “*”

apachectl -t
service httpd reload

結果: grafana web 能正常添加 metric


5測試 grafana-spark-dashboards

github grafana-spark-dashboards
Note
grafana-spark-dashboards目前僅提供監控spark-on-yarn模式下dashboard,對於standalone模式下的監控好像還不支持

1) install
git clone https://github.com/hammerlab/grafana-spark-dashboards.git

cd grafana-spark-dashboards
cp spark.js /data/grafana/app/dashboards/

cd /data/grafana/app/dashboards/
cp spark.js spark.js.org
vi spark.js
a 查找 fetchYarnApps 設置 spark.js try to hit

http://spark1:8088/ws/v1/cluster/apps, which should be your YARN RM’s
JSON API (try this with a curl first to be sure).
jQuery.ajax(‘http://spark1:8088/ws/v1/cluster/apps‘, {

2)測試grafana-spark-dashboards 監控 spark-app running on yarn
測試 http://monitor1:61081/#/dashboard/file/spark.js
問題1: http://monitor1:61081/#/dashboard/file/spark.js 訪問失敗,web提示Error Could not load dashboards/spark.js. Please make sure it exists ,日誌提示

[Thu Mar 12 09:23:42 2015] [error] [client 192.168.99.1] File does not
exist: /var/www/html/grafana/app/dashboards/spark, referer:
http://monitor1:61081/

處理方法1:

mkdir /var/www/html/grafana/app/dashboards/spark
chown -R apache /var/www/html/grafana/app/dashboards/spark

結果:問題依舊:日誌提示

[Thu Mar 12 09:27:29 2015] [error] [client 192.168.99.1] File does not
exist: /var/www/html/grafana/app/dashboards/spark/js, referer:
http://monitor1:61081/

處理方法2:

mkdir /var/www/html/grafana/app/dashboards/spark/js
chown -R apache /var/www/html/grafana/app/dashboards/spark/js

結果:web可以進行操作,apache日誌沒有報錯

更新20150320
使用 grafana-spark-dashboards 監控 spark應用訪問 http://monitor1:61081/#/dashboard/file/spark.js 目前需要提供參數

test steps:
(1)step1: submit a spark-streaming example NetworkWordCount to yarn using spark-submit

#yarn-client
terminal1
nc -lk 9999

terminal2
export SPARK_HOME=/data01/app/spark/spark-1.2.1-SNAPSHOT-bin-2.3.0-cdh5.1.3
export SPARK_HOME=$(pwd)
SPARK_APP_JAR=$SPARK_HOME/examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar

./bin/spark-submit \
--class org.apache.spark.examples.streaming.NetworkWordCount \
--master yarn-client \
--driver-memory 300M \
--executor-memory 300M \
--num-executors 2 \
--executor-cores 1 \
--files conf/metrics.properties \
$SPARK_APP_JAR \
spark1 9999

(2)step2: get YARNAppID from yarn WebUI
示例:YARNAppID=application_1426815734479_0001

(3)step3: access grafana-spark-dashboard to monitor specific YARNAppID
http://monitor1:62081/#/dashboard/script/spark.js?app=application_1426815734479_0001
問題:訪問報錯,具體信息見:
spark-developers-listMonitoring Spark with Graphite and Grafana
問題已解決
Need some FQA while using grafana-spark-dashboards

這裏看看 grafana-spark-dashboards 展示效果
grafana-spark-dashboards 展示效果
grafana-spark-dashboards 展示效果


進展
20150313,測試到grafana-web使用grafana-spark-dashboards查看spark指標時,目前遇到一些問題,對 grafana web 上定製 graph 的操作不夠熟悉,grafana-spark-dashboards 的使用有些問題

20153020: 能夠在 grafana-web定製自定義的dashboard,但測試github社區項目grafana-spark-dashboards 時,遇到CORS(cross-origin request sharing)問題

20150327更新: 20130323解決使用 grafana-spark-dashboards 過程中遇到的問題

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章