Linux下監控系統搭建(Telegraf+Influxdb+Grafana)
一、安裝文件準備(可提前去官網下載好)
telegraf-1.12.4-1.x86_64.rpm
influxdb-1.7.8.x86_64.rpm 單機的免費,集羣的收費
grafana-6.4.3-1.x86_64.rpm
kapacitor-1.5.3.x86_64.rpm (TIGK技術棧的告警服務)
二、安裝
1、創建存放軟件目錄
mkdir /home/ldw/monitor
把下載的安裝文件上傳到服務器的monitor目錄下
登錄到monitor所在目錄下賦權
chmod -R 777 monitor
2、安裝
安裝命令:(如果是分佈式監控,需要在其他client端安裝telegraf)
rpm -ivh telegraf-1.12.4-1.x86_64.rpm
rpm -ivh influxdb-1.7.8.x86_64.rpm
rpm -ivh grafana-6.4.3-1.x86_64.rpm
rpm -ivh kapacitor-1.5.3.x86_64.rpm
安裝過程:(登錄到安裝軟件所在目錄下)
[root@node2 monitor]# rpm -ivh telegraf-1.12.4-1.x86_64.rpm
準備中... ################################# [100%]
正在升級/安裝...
1:telegraf-1.12.4-1 ################################# [100%]
Created symlink from /etc/systemd/system/multi-user.target.wants/telegraf.service to /usr/lib/systemd/system/telegraf.service.
[root@node2 monitor]# rpm -ivh influxdb-1.7.8.x86_64.rpm
準備中... ################################# [100%]
正在升級/安裝...
1:influxdb-1.7.8-1 ################################# [100%]
Created symlink from /etc/systemd/system/influxd.service to /usr/lib/systemd/system/influxdb.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/influxdb.service to /usr/lib/systemd/system/influxdb.service.
[root@node2 monitor]# rpm -ivh grafana-6.4.3-1.x86_64.rpm
警告:grafana-6.4.3-1.x86_64.rpm: 頭V4 RSA/SHA1 Signature, 密鑰 ID 24098cb6: NOKEY
準備中... ################################# [100%]
正在升級/安裝...
1:grafana-6.4.3-1 ################################# [100%]
### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server.service
### You can start grafana-server by executing
sudo /bin/systemctl start grafana-server.service
POSTTRANS: Running script
[root@node2 monitor]# rpm -ivh kapacitor-1.5.3.x86_64.rpm
準備中... ################################# [100%]
正在升級/安裝...
1:kapacitor-1.5.3-1 ################################# [100%]
監控軟件安裝後的配置文件地址如下:
/etc/telegraf/telegraf.conf
/etc/influxdb/influxdb.conf
/etc/grafana/grafana.ini
/etc/kapacitor/kapacitor.conf
監控軟件安裝後的log文件地址如下:
/var/log/telegraf/telegraf.log
/var/log/influxdb/influxdb.log
/var/log/grafana/grafana.log
Grafana插件地址:
/var/lib/grafana/plugins
Influxdb的後臺文件保存位置:
/var/lib/influxdb/meta #元數據/raft數據庫的存儲位置
/var/lib/influxdb/data #TSM存儲引擎存儲TSM文件的目錄
/var/lib/influxdb/wal #TSM存儲引擎存儲WAL文件的目錄
三、配置
1、Telegraf配置
[agent]
#修改數據採集間隔
interval = "5s"
[outputs.influxdb]
#修改對應的influxdb的url,IP修改成安裝influxdb服務器的IP地址
urls = ["http://10.67.31.74:8086"]
#修改對應的influxdb的數據庫名稱,使用默認的telegraf就可以,後續啓動influxdb數據庫的時候要創建telegraf名稱的數據庫就可以。
database = "telegraf"
2、Influxdb配置
# Determines whether HTTP endpoint is enabled.主要作用是接收telegraf的數據並存儲,提供API給Grafana調用數據
enabled = true
# The bind address used by the HTTP service.打開HTTP API使用的端口
bind-address = ":8086"
3、Grafana配置
# The public facing domain name used to access grafana from a browser 從瀏覽器訪問grafana的面向公衆的域名
;domain = 10.67.31.74
# The full public facing url you use in browser, used for redirects and emails 瀏覽器中使用的面向公衆的完整url,用於重定向和電子郵件
;root_url = http://10.67.31.74:3000
默認的登錄用戶名密碼都是admin,不用修改
四、啓動
啓動命令:
systemctl start telegraf
systemctl start influxdb
systemctl start grafana-server
查看啓動情況
systemctl status telegraf
systemctl status influxdb
systemctl status grafana-server
停止命令:
systemctl stop telegraf
systemctl stop influxdb
systemctl stop grafana-server
五、Influxdb數據庫配置
啓動influxdb後,需要配置下數據庫
[root@node2 ~]# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
> create user "telegraf" with password 'telegraf'
> show users;
user admin
telegraf false
> create database telegraf
> show databases
name: databases
---------------
name
_internal
telegraf
#使用數據庫
>use telegraf
#顯示該數據庫中所有的表
>show measurements
六、Grafana使用
登錄Grafana
http://10.67.31.74:3000
用戶名密碼:admin/admin
登錄後配置數據源
配置數據源:
提前下載了合適的Dashboard文件,直接導入。選用server-single_rev3.json
然後可以自己起個模板名字,選擇influxdb類型數據庫,點擊import進行導入。
導入成功後,就可以進行模板的配置。
這個模板server-single_rev3.json有特殊的配置要求,需要重新配置telegraf,以下是配置信息,需要到linux後臺重新配置telegraf.conf文件。
telegraf.conf重新配置:
--------------------------------------------------------------------------------------------------------
[global_tags]
host = "$HOSTNAME"
##注意每個client都要配置自己的hostname
[agent]
interval = "5m"
[[outputs.influxdb]]
urls = ["http://mydomain.invalid:8086"]
database = "servermonitor"
[[inputs.cpu]]
percpu = false
totalcpu = true
collect_cpu_time = true
fielddrop = ["time_guest","time_guest_nice","time_irq","time_nice","time_softirq","time_steal","usage_guest","usage_guest_nice","usage_irq","usage_nice","usage_softirq","usage_steal"]
interval = "2s"
[[inputs.disk]]
mount_points = ["/","/var","/data"]
fielddrop=["used","inodes_used"]
[[inputs.mem]]
fielddrop=["active","buffered","cached","free","inactive","used","used_percent"]
[[inputs.processes]]
[[inputs.swap]]
fielddrop=["free","total"]
[[inputs.system]]
fielddrop=["n_users","uptime_format"]
[[inputs.nstat]]
interval = "2s"
#proc_net_netstat = "" # this is of interest.
##注意:這條不知道別配,先註釋掉,否則配置成空,telegraf會啓動不了。
fieldpass = ["IpExtOutOctets","IpExtInOctets"]
telegraf.conf文件配置完成後要重啓telegraf。
可以通過腳本或者手動,重新啓動telegraf+influxdb+grafana.重新登錄grafana就可以看到下面的截圖,保留自己想監控的指標,其他指標刪除了就可以了。
這個模板的好處就是可以通過左上角的hostname來隨時切換無服務。進行不同服務器的監控指標查看。
上面模板各個指標的配置條件導出:
CPU:
SELECT mean("n_cpus") FROM "system" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)
SELECT mean("usage_system") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)
SELECT mean("usage_user") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)
SELECT mean("usage_iowait") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)
RAM:
SELECT mean("available") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)
SELECT mean("total") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)
swap:
SELECT derivative(mean("in"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)
SELECT derivative(mean("out"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)
SELECT mean("used_percent") FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)
Disk:
SELECT mean("total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)
SELECT mean("free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)
SELECT mean("inodes_total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)
SELECT mean("inodes_free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)
Processes:
SELECT mean("total") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)
SELECT mean("running") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)
SELECT mean("blocked") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)
SELECT mean("stopped") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)
SELECT max("blocked") FROM "processes" WHERE $timeFilter GROUP BY time($interval), "host" fill(null)
七、腳本
附件是一鍵啓動、停止監控腳本。參考。
/home/ldw/monitor/script
腳本內容參考:
start.sh
ssh [email protected] 'systemctl start telegraf'&ssh [email protected] 'systemctl start influxdb'&ssh [email protected] 'systemctl start grafana-server'&ssh [email protected] 'systemctl start telegraf'&
stop.sh
ssh [email protected] 'systemctl stop telegraf'&ssh [email protected] 'systemctl stop influxdb'&ssh [email protected] 'systemctl stop grafana-server'&ssh [email protected] 'systemctl stop telegraf'&