Linux下輕量級監控系統搭建(Telegraf+Influxdb+Grafana)

Linux下監控系統搭建(Telegraf+Influxdb+Grafana)

 

一、安裝文件準備(可提前去官網下載好)

 

telegraf-1.12.4-1.x86_64.rpm

influxdb-1.7.8.x86_64.rpm 單機的免費,集羣的收費

grafana-6.4.3-1.x86_64.rpm

kapacitor-1.5.3.x86_64.rpm (TIGK技術棧的告警服務)

 

二、安裝

 

1、創建存放軟件目錄

mkdir /home/ldw/monitor

把下載的安裝文件上傳到服務器的monitor目錄下

登錄到monitor所在目錄下賦權

chmod -R 777 monitor

 

2、安裝

安裝命令:(如果是分佈式監控,需要在其他client端安裝telegraf)

rpm -ivh telegraf-1.12.4-1.x86_64.rpm

rpm -ivh influxdb-1.7.8.x86_64.rpm

rpm -ivh grafana-6.4.3-1.x86_64.rpm

rpm -ivh kapacitor-1.5.3.x86_64.rpm

 

安裝過程:(登錄到安裝軟件所在目錄下)

[root@node2 monitor]# rpm -ivh telegraf-1.12.4-1.x86_64.rpm

準備中...                          ################################# [100%]

正在升級/安裝...

   1:telegraf-1.12.4-1                ################################# [100%]

Created symlink from /etc/systemd/system/multi-user.target.wants/telegraf.service to /usr/lib/systemd/system/telegraf.service.

 

[root@node2 monitor]# rpm -ivh influxdb-1.7.8.x86_64.rpm

準備中...                          ################################# [100%]

正在升級/安裝...

   1:influxdb-1.7.8-1                 ################################# [100%]

Created symlink from /etc/systemd/system/influxd.service to /usr/lib/systemd/system/influxdb.service.

Created symlink from /etc/systemd/system/multi-user.target.wants/influxdb.service to /usr/lib/systemd/system/influxdb.service.

 

[root@node2 monitor]# rpm -ivh grafana-6.4.3-1.x86_64.rpm

警告:grafana-6.4.3-1.x86_64.rpm: 頭V4 RSA/SHA1 Signature, 密鑰 ID 24098cb6: NOKEY

準備中...                          ################################# [100%]

正在升級/安裝...

   1:grafana-6.4.3-1                  ################################# [100%]

### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd

 sudo /bin/systemctl daemon-reload

 sudo /bin/systemctl enable grafana-server.service

### You can start grafana-server by executing

 sudo /bin/systemctl start grafana-server.service

POSTTRANS: Running script

 

[root@node2 monitor]# rpm -ivh kapacitor-1.5.3.x86_64.rpm

準備中...                          ################################# [100%]

正在升級/安裝...

   1:kapacitor-1.5.3-1                ################################# [100%]

 

監控軟件安裝後的配置文件地址如下:

/etc/telegraf/telegraf.conf

/etc/influxdb/influxdb.conf

/etc/grafana/grafana.ini

/etc/kapacitor/kapacitor.conf

 

監控軟件安裝後的log文件地址如下:

/var/log/telegraf/telegraf.log

/var/log/influxdb/influxdb.log

/var/log/grafana/grafana.log

 

Grafana插件地址

/var/lib/grafana/plugins

 

Influxdb的後臺文件保存位置:

/var/lib/influxdb/meta  #元數據/raft數據庫的存儲位置

/var/lib/influxdb/data  #TSM存儲引擎存儲TSM文件的目錄

/var/lib/influxdb/wal   #TSM存儲引擎存儲WAL文件的目錄

 

 

三、配置

 

1、Telegraf配置

[agent]

#修改數據採集間隔

interval = "5s"

 

[outputs.influxdb]

#修改對應的influxdb的url,IP修改成安裝influxdb服務器的IP地址

urls = ["http://10.67.31.74:8086"]

#修改對應的influxdb的數據庫名稱,使用默認的telegraf就可以,後續啓動influxdb數據庫的時候要創建telegraf名稱的數據庫就可以。

database = "telegraf"

 

2、Influxdb配置

# Determines whether HTTP endpoint is enabled.主要作用是接收telegraf的數據並存儲,提供API給Grafana調用數據

enabled = true

# The bind address used by the HTTP service.打開HTTP API使用的端口

bind-address = ":8086"

 

3、Grafana配置

# The public facing domain name used to access grafana from a browser 從瀏覽器訪問grafana的面向公衆的域名

;domain = 10.67.31.74

# The full public facing url you use in browser, used for redirects and emails 瀏覽器中使用的面向公衆的完整url,用於重定向和電子郵件

;root_url = http://10.67.31.74:3000

默認的登錄用戶名密碼都是admin,不用修改

 

 

四、啓動

 

啓動命令:

systemctl start telegraf

systemctl start influxdb

systemctl start grafana-server

 

查看啓動情況

systemctl status telegraf

systemctl status influxdb

systemctl status grafana-server

 

停止命令:

systemctl stop telegraf

systemctl stop influxdb

systemctl stop grafana-server

 

 

五、Influxdb數據庫配置

 

啓動influxdb後,需要配置下數據庫

 

[root@node2 ~]# influx

   Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.

   Connected to http://localhost:8086 version 1.0.2

   InfluxDB shell version: 1.0.2

   > create user "telegraf" with password 'telegraf'

   > show users;

   user     admin

   telegraf false

   > create database telegraf

   > show databases

   name: databases

   ---------------

   name

   _internal

   telegraf

 

#使用數據庫

>use telegraf

 

#顯示該數據庫中所有的表

>show measurements

 

 

六、Grafana使用

 

登錄Grafana

http://10.67.31.74:3000

用戶名密碼:admin/admin

 

登錄後配置數據源

 

配置數據源:

 

 

提前下載了合適的Dashboard文件,直接導入。選用server-single_rev3.json

 

 

 

然後可以自己起個模板名字,選擇influxdb類型數據庫,點擊import進行導入。

 

導入成功後,就可以進行模板的配置。

這個模板server-single_rev3.json有特殊的配置要求,需要重新配置telegraf,以下是配置信息,需要到linux後臺重新配置telegraf.conf文件。

 

telegraf.conf重新配置:

--------------------------------------------------------------------------------------------------------

[global_tags]

 

  host = "$HOSTNAME"

  ##注意每個client都要配置自己的hostname

[agent]

 

  interval = "5m"

 

[[outputs.influxdb]]

 

  urls = ["http://mydomain.invalid:8086"]

 

  database = "servermonitor"

 

[[inputs.cpu]]

 

  percpu = false

 

  totalcpu = true

 

  collect_cpu_time = true

 

  fielddrop = ["time_guest","time_guest_nice","time_irq","time_nice","time_softirq","time_steal","usage_guest","usage_guest_nice","usage_irq","usage_nice","usage_softirq","usage_steal"]

 

  interval = "2s"

 

[[inputs.disk]]

 

  mount_points = ["/","/var","/data"]

 

  fielddrop=["used","inodes_used"]

 

[[inputs.mem]]

 

  fielddrop=["active","buffered","cached","free","inactive","used","used_percent"]

 

[[inputs.processes]]

 

[[inputs.swap]]

 

  fielddrop=["free","total"]

 

[[inputs.system]]

 

  fielddrop=["n_users","uptime_format"]

 

[[inputs.nstat]]

 

  interval = "2s"

 

  #proc_net_netstat = "" # this is of interest.

    ##注意:這條不知道別配,先註釋掉,否則配置成空,telegraf會啓動不了。

  fieldpass = ["IpExtOutOctets","IpExtInOctets"]

 

telegraf.conf文件配置完成後要重啓telegraf。

 

可以通過腳本或者手動,重新啓動telegraf+influxdb+grafana.重新登錄grafana就可以看到下面的截圖,保留自己想監控的指標,其他指標刪除了就可以了。

這個模板的好處就是可以通過左上角的hostname來隨時切換無服務。進行不同服務器的監控指標查看。

 

 

上面模板各個指標的配置條件導出:

CPU:

SELECT mean("n_cpus") FROM "system" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_system") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_user") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

SELECT mean("usage_iowait") FROM "cpu" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(none)

 

RAM:

SELECT mean("available") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT mean("total") FROM "mem" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

 

swap:

SELECT derivative(mean("in"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT derivative(mean("out"), 1s) FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT mean("used_percent") FROM "swap" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "host" fill(null)

 

Disk:

SELECT mean("total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("inodes_total") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

SELECT mean("inodes_free") FROM "disk" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval), "path" fill(null)

 

Processes:

SELECT mean("total") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("running") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("blocked") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT mean("stopped") FROM "processes" WHERE ("host" =~ /^$host$/) AND $timeFilter GROUP BY time($interval) fill(null)

SELECT max("blocked") FROM "processes" WHERE $timeFilter GROUP BY time($interval), "host" fill(null)

 

 

七、腳本

 

附件是一鍵啓動、停止監控腳本。參考。

/home/ldw/monitor/script

 

腳本內容參考:

start.sh

ssh [email protected] 'systemctl start telegraf'&ssh [email protected] 'systemctl start influxdb'&ssh [email protected] 'systemctl start grafana-server'&ssh [email protected] 'systemctl start telegraf'&

stop.sh

ssh [email protected] 'systemctl stop telegraf'&ssh [email protected] 'systemctl stop influxdb'&ssh [email protected] 'systemctl stop grafana-server'&ssh [email protected] 'systemctl stop telegraf'&

發佈了1 篇原創文章 · 獲贊 2 · 訪問量 428
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章