一 應用場景描述
使用了多個IDC機房的服務器作爲外層代理,但是經常有用戶反映網站卡。當聯繫用戶時,又不卡了。所以有必要對每臺外網服務器到各個區域的網絡質量進行監測以確保不是服務器的網絡問題。網絡質量監測工具可以使用smokeping,也可以根據smokeping的原理自己開發
二 安裝並使用Smokeping
1.安裝依賴軟件包
Smokeping使用RRDTool來繪圖,使用fping,curl,dig等工具來檢測,如果部署成單個實例,那麼每個實例都需要安裝一個Web服務,如果部署成Master/Slave模式,只需要Master安裝Web服務就行
yum -y install perl perl-Net-Telnet perl-Net-DNS perl-LDAP perl-libwww-perl perl-RadiusPerl perl-IO-Socket-SSL perl-Socket6 perl-CGI-SpeedyCGI perl-FCGI perl-CGI-SpeedCGI perl-Time-HiRes perl-ExtUtils-MakeMaker perl-RRD-Simple rrdtool rrdtool-perl curl fping echoping gcc make wget libxml2-devel libpng-devel glib pango pango-devel freetype freetype-devel fontconfig cairo cairo-devel libart_lgpl libart_lgpl-devel mod_fastcgi fping tcping tcpingtreaceroute yum -y install httpd httpd-devel
2.安裝和配置Smokeping
wget http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz tar zxvf smokeping-2.6.11.tar.gz ./setup/build-perl-modules.sh /opt/app/smokeping/thirdparty ./configure --prefix=/opt/app/smokeping /usr/bin/gmake install useradd smokeping -s /sbin/nologin mkdir -p /opt/data/smokeping/{data,cache} /opt/logs/smokeping /opt/run/smokeping chown -R smokeping:smokeping /opt/app/smokeping/ /opt/data/smokeping/ /opt/run/smokeping/ /opt/logs/smokeping/
配置httpd,只有smokeping單實例或者master才需要配置httpd
/etc/httpd/conf.d/smokeping.conf
NameVirtualHost *:80 <VirtualHost *:80> <Location /> Order Allow,Deny Deny from all </Location> </VirtualHost> <VirtualHost *:80> ServerName networklatency.xxxxx.com ErrorLog /var/log/httpd/smokeping_error.log CustomLog /var/log/httpd/smokeping_access.log common Alias /cache "/opt/data/smokeping/cache/" Alias /cropper "/opt/app/smokeping/htdocs/cropper/" Alias /smokeping "/opt/app/smokeping/htdocs/smokeping.fcgi" <Directory "/opt/app/smokeping"> AllowOverride None Options All AddHandler cgi-script .fcgi .cgi AllowOverride AuthConfig Order allow,deny Allow from all #AuthName "smokeping" #AuthType Basic #AuthUserFile "/opt/app/smokeping/htdocs/htpasswd" #Require valid-user DirectoryIndex smokeping.fcgi </Directory> </VirtualHost>
修改/etc/httpd/conf/httpd.conf
User smokeping Group smokeping
由於smokeping可能會放在外網環境,所以安全方面就需要注意一下,在smokeping.conf文件中設置禁止IP直接訪問smokeping,只能通過域名訪問
然後就是配置smokeping,注意只有單實例或者master才需要有配置文件,slave不需要配置文件,slave是定期動態地從master端獲取配置信息
etc/examples 這個目錄裏面有幾個配置案例,可以根據自己需要進行修改
*** General *** owner = admin contact = [email protected] mailhost = localhost sendmail = /usr/sbin/sendmail # NOTE: do not put the Image Cache below cgi-bin # since all files under cgi-bin will be executed ... this is not # good for images. imgcache = /opt/data/smokeping/cache imgurl = cache datadir = /opt/data/smokeping/data piddir = /opt/run/smokeping cgiurl = http://xxxx.com/smokeping smokemail = /opt/app/smokeping/etc/smokemail.dist tmail = /opt/app/smokeping/etc/tmail.dist # specify this to get syslog logging syslogfacility = local0 # each probe is now run in its own process # disable this to revert to the old behaviour concurrentprobes = yes
imgcache datadir piddir cgiurl 根據自己情況修改,幾個目錄的權限一定要是smokeping和httpd運行的用戶,比如smokeping用戶
+ detail width = 600 height = 200 unison_tolerance = 2 "Last 1 Hour" 1h "Last 2 Hour" 2h "Last 3 Hour" 3h "Last 6 Hours" 6h "Last 12 Hours" 12h "Last 1 Day" 1d "Last 7 Days" 7d "Last 15 Days" 15d "Last 30 Days" 30d
這裏可以自定義設置顯示時間範圍,比如1h,3h,1d
*** Slaves *** secrets=/opt/app/smokeping/etc/smokeping_secrets.dist +slave1 display_name=slave1 location=HK color=382a34 *** Targets *** slaves = slave1
Slaves欄定義有哪些Slaves,+表示添加一個slave,display_name顯示名稱,location表示位置,比如香港,廣東等,color表示在一個圖中顯示多個slaves時的顏色,顏色代碼必須是小寫和字母,可以根據這裏選擇http://www.colorpicker.com/
Targets欄就是定義具體要探測的點了,可以指定不同的probes去探測。
+ Clients menu = 到客戶所在網絡區域網絡監測 #host = /Yunying/plat/plat223.255.151.87 /Yunying/plat/plat223.255.151.86 /Yunying/plat/plat103.250.15.6 title = 到客戶所在網絡區域網絡監測列表 ++ dianxin menu = 電信網絡監控 title = 電信網絡監控列表 +++ dianxin-hlj menu = 黑龍江電信 title = 黑龍江電信 host = 219.150.32.132 +++ dianxin-gd menu = 廣東電信 title = 廣東電信 host = 113.111.211.22 +++ dianxin-gs menu = 甘肅電信 title = 甘肅電信 alerts = someloss #slaves = boomer slave2 host = 202.100.64.68 +++ dianxin-sh menu = 上海電信 title = 上海電信 alerts = someloss #slaves = boomer slave2 host = 202.96.209.5 #+++ dianxin-multi #menu = 多個電信網絡監控列表 #title = 多個電信網絡監控列表 #alerts = someloss #slaves = boomer slave2 # ++ liantong menu = 聯通網絡監控 title = 聯通網絡監控列表 +++ liantong-hlj menu = 黑龍江聯通 title = 黑龍江聯通 host = 202.97.224.68 +++ liantong-gd menu = 廣東聯通 #slaves = boomer slave2 host = 221.4.66.66 +++ liantong-gs menu = 甘肅聯通 title = 甘肅聯通 alerts = someloss #slaves = boomer slave2 host = 221.7.34.10 +++ liantong-sh menu = 上海聯通 title = 上海聯通 alerts = someloss #slaves = boomer slave2 host = 210.22.70.3 #+++ liantong-multi #menu = 多個聯通網絡監控列表 #title = 多個聯通網絡監控列表 #alerts = someloss #slaves = boomer slave2 ++ yidong menu = 移動網絡監控 title = 移動網絡監控列表 +++ yidong-hlj menu = 黑龍江移動 title = 黑龍江移動 host = 211.137.241.34 +++ yidong-gd menu = 廣東移動 #slaves = boomer slave2 host = 211.137.241.34 +++ yidong-gs menu = 甘肅移動 title = 甘肅移動 alerts = someloss #slaves = boomer slave2 host = 218.203.160.194 +++ yidong-sh menu = 上海移動 title = 上海移動 alerts = someloss #slaves = boomer slave2 host = 117.131.0.22 #+++ yidong-multi #menu = 多個移動網絡監控列表 #title = 多個移動網絡監控列表 #alerts = someloss #slaves = boomer slave2 ++ jiaoyu menu = 教育網絡監控 title = 教育網絡監控列表 +++ jiaoyu-qh menu = 清華大學 title = 清華大學 host = 166.111.8.28 +++ jiaoyu-sh menu = 上海交大 title = 上海交大 alerts = someloss #slaves = boomer slave2 host = 202.112.26.34 +++ jiaoyu-wh menu = 武漢科技大學 title = 武漢科技大學 alerts = someloss #slaves = boomer slave2 host = 202.114.240.6 +++ jiaoyu-hn menu = 華南農業大學 title = 華南農業大學 alerts = someloss #slaves = boomer slave2 host = 202.116.160.33 #+++ jiaoyu-multi #menu = 多個教育網絡監控列表 #title = 多個教育網絡監控列表 #alerts = someloss #slaves = boomer slave2 #host = /Clients/jiaoyu/jiaoyu-qh /Clients/jiaoyu/jiaoyu-sh /Clients/jiaoyu/jiaoyu-wh /Clients/jiaoyu/jiaoyu-hn
smokeping_secrets.dist 文件是master端與slave端交互的密鑰文件,格式如下
slave1:xxxxxx
文件內的slave名稱一定要和slave啓動的名稱一樣,默認是主機名
slave端指定一個secret.txt文件用於與master端交互,格式如下
xxxxxx
只含有密鑰
這兩個文件的權限都必須是600,並且屬主是smokeping的啓動用戶
添加master啓動腳本
#! /bin/sh # # smokeping-master Start/Stop smokeping-master # # chkconfig: 345 99 99 # description: smokeping master # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-master" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" data_dir="/opt/data/smokeping/data" smokeping_log="/opt/logs/smokeping/smokeping_master.log" smokeping_secrets="/opt/app/smokeping/etc/smokeping_secrets.dist" pid_dir="/opt/run/smokeping" user="smokeping" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi if [ ! -d $data_dir ];then echo -e "\e[35mData dir ${data_dir} doesn't exist.Creating\e[0m" mkdir -p $data_dir fi chown -R $user $log_dir $cache_dir $pid_dir $smokeping_secrets chmod 600 $smokeping_secrets find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --logfile=${smokeping_log} > /dev/null 2>&1 find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
添加slave啓動腳本
#! /bin/sh # # smokeping-slave Start/Stop smokeping-slave # # chkconfig: 345 99 99 # description: smokeping slave # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-slave" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" smokeping_log="/opt/logs/smokeping/smokeping_slave.log" master_url="http://networklatency.caipiao88.com/smokeping" shared_secret="/opt/app/smokeping/etc/secret.txt" pid_dir="/opt/run/smokeping" user="smokeping" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi chown -R $user $log_dir $cache_dir $pid_dir $shared_secret find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --master-url=${master_url} --cache-dir=${cache_dir} --shared-secret=${shared_secret} --pid-dir=${pid_dir} --logfile=${smokeping_log} > /dev/null 2>&1 find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
啓動Master
service smokeping-master start
啓動Slave
service smokeping-slave start
三 線上部署Smokeping
線上部署可以採用Master-Slave方案,Master放置在防火牆內,Slave就是需要執行各種探測任務的服務器。
[slave 1] [slave 2] [slave 3] | | | +-------+ | +--------+ | | | v v v +---------------+ | master | +---------------+
Slave端收集完數據會通過Master的CGI接口上傳數據,然後到Master段進行彙總顯示,所以Master只需要按照上述步驟直接安裝就行了,由於部署的Slave數量可能比較多,最好採用Ansible或者SaltStack批量部署
可以自定義製作rpm包方便部署
rpm包的製作方法參考http://john88wang.blog.51cto.com/2165294/1787783
採用Master-Slave方案有一個弊端,如果slave只有三四個,那麼問題還不大,但是如果想通過Master來收集很多個slave,那麼smokeping頁面打開會很慢很慢。因爲我想要檢測線上外網的每臺服務器的網絡質量,所以如果採用一個Master,其他都當作slave的話,smokeping的頁面根本就打不開。所以只有放棄Master-Slave方案,而是在每個外網服務器上部署一個smokeping實例。另外,線上的外層代理都是部署的nginx,所以沒有必要再單獨爲smokeping部署apache了。
Nginx默認是不能處理perl cgi程序的,需要藉助於spawn-fcgi
yum -y install spawn-fcgi
/etc/init.d/smokeping-fcgi
#!/bin/sh # # chkconfig: - 86 14 # description: smokeping-fcgi exec=/usr/bin/spawn-fcgi fcgi_port=9007 pid_file=/opt/run/smokeping/smokeping-fcgi.pid fcgi_user=nobody fcgi_app=/opt/app/smokeping/bin/smokeping_cgi find_smokepingfcgi_pid() { pid=$(ps -ef|grep smokeping_cgi|grep -v grep|grep $fcgi_user|awk '{print $2}') } start() { echo -e "\e[035mStarting smokeping-fcgi\e[0m" $exec -a 127.0.0.1 -p $fcgi_port -P $pid_file -u $fcgi_user -f $fcgi_app > /dev/null 2>&1 find_smokepingfcgi_pid if [ "$pid" != "" ];then echo -e "\e[035mStarting OK\e[0m" else echo -e "\e[035mStarting Failed\e[0m" fi } stop() { echo -e "\e[035mShutting down smokeping-fcgi\e[0m" find_smokepingfcgi_pid if [ "$pid" != "" ];then kill -9 $pid rm $pid_file else echo -e "\e[035msmokeping-fcgi is not running yet\e[0m" fi } status() { find_smokepingfcgi_pid if [ "$pid" != "" ];then echo -e "\e[035m smokeping-fcgi is running: pid:$pid\e[0m" else echo -e "\e[035m smokeping-fcgi is not running\e[0m" fi } restart() { stop start } case "$1" in start|stop|restart|status) $1 ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}" exit 2 ;; esac
/etc/init.d/smokeping-master
#! /bin/sh # # smokeping-master Start/Stop smokeping-master # # chkconfig: 345 99 99 # description: smokeping master # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-master" smokeping_app="/opt/app/smokeping" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" data_dir="/opt/data/smokeping/data" smokeping_log="/opt/logs/smokeping/smokeping_master.log" smokeping_secrets="/opt/app/smokeping/etc/smokeping_secrets.dist" pid_dir="/opt/run/smokeping" user="nobody" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi if [ ! -d $data_dir ];then echo -e "\e[35mData dir ${data_dir} doesn't exist.Creating\e[0m" mkdir -p $data_dir fi ln -sf $cache_dir $smokeping_app ln -sf $data_dir $smokeping_app chown -R $user $log_dir $cache_dir $pid_dir $smokeping_secrets chmod 600 $smokeping_secrets find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --logfile=${smokeping_log} > /dev/null find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" service smokeping-fcgi start else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
smokeping config文件變動
imgcache = /opt/app/smokeping/cache imgurl = cache datadir = /opt/app/smokeping/data piddir = /opt/run/smokeping cgiurl = http://networklatency.xxxx.com/smokeping.fcgi
Nginx配置 smokeping.conf
server { listen 80; server_name networklatency.xxxx.com; root /opt/app/smokeping/htdocs; location /cache { root /opt/app/smokeping; } location ~ .*\.fcgi$ { include fastcgi_params; fastcgi_pass 127.0.0.1:9007; fastcgi_index smokeping.fcgi; #fastcgi_param SCRIPT_FILENAME /opt/app/smokeping/htdocs/$fastcgi_script_name; }
需要注意smokeping-master和smokeping-fcgi腳本中的執行用戶要和nginx的運行用戶一致
剩下的事情就是批量部署smokeping實例了
四 Smokeping替代方案
參考文檔:
http://blog.coocla.org/smokeping-slave.html
http://blog.coocla.org/smokeping-with-nginx.html
http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz