(二) prometheus + grafana + alertmanager 配置Mysql監控

 安裝請看https://blog.51cto.com/liuqs/2027365 ,最好是對應的版本組件,否則可能會有差別。

(一)prometheus + grafana + alertmanager 配置主機監控

(二)prometheus + grafana + alertmanager 配置Mysql監控

(三)prometheus + grafana + alertmanager 配置Redis監控

(四)prometheus + grafana + alertmanager 配置Kafka監控

(五)prometheus + grafana + alertmanager 配置ES監控


(二)prometheus + grafana + alertmanager 配置Mysql監控

  1. mysqld_exporter安裝與配置

    A. mysqld服務安裝在每臺Linux服務器上

    1. 下載mysqld_exporter到每臺mysqld服務器上(下載地址: https://pan.baidu.com/s/1pW7RptzXa3LqFlO5zxJXPw ),並解壓到/data/monitor/下

    2. 安裝go環境, yum install go -y

    3. 用root用戶連接當前mysql,授權監控用戶

      mysql> GRANT REPLICATION CLIENT,PROCESS ON *.* TO 'mysql_monitor'@'localhost' identified by 'Jvsa09OodhvS0VKQ';

      mysql> FLUSH PRIVILEGES;

    4. cd /data/monitor/mysqld_exporter下,創建.my.cnf文件,vim .my.cnf

      [client]

      host=10.8.4.126

      port=3306

      user=mysql_monitor

      password=Jvsa09OodhvS0VKQ

    5. 啓動mysqld_exporter  /data/monitor/mysqld_exporter/bin/mysqld_exporter -config.my-cnf="/data/monitor/.my.cnf" &


B. 使用的是雲商的mysql db(我們使用的是ucloud的udb,下面的都按這個來實現,都差不多)

  1. 下載mysqld_exporter到prometheus服務器上((登陸到prometheus服務器,prometheus grafana alertmanager在同一臺服務器上)下載地址: https://pan.baidu.com/s/1MNPbhoZEvVV4lf1bVXWJ1g ),並解壓到/data/monitor/下

  2. 如果沒有安裝go環境, yum install go -y

  3. 用root用戶連接當前mysql,授權監控用戶

    mysql> GRANT REPLICATION CLIENT,PROCESS ON *.* TO 'mysql_monitor'@'%' identified by 'Jvsa09OodhvS0VKQ';

    mysql> FLUSH PRIVILEGES;

  4. cd /data/monitor/mysqld_exporter下,創建.my.cnf文件夾,然後在文件下創建每個db的連接配置文件。以下是一個的實例,其它的請參照這個來創建。

    cat /data/monitor/mysqld_exporter/.my.cnf/.ba_master_10.8.4.126_3306_15049.cnf

    [client]

    host=10.8.4.126

    port=3306

    user=mysql_monitor

    password=Jvsa09OodhvS0VKQ

  5. 然後cd /data/monitor/mysqld_exporter/scripts下,創建各個mysqld_exporter的啓動腳本,下面是一個mysql db 的mysqld_exporter啓動腳本,其它請參照這個來創建,注意監聽的端口要不同和調用的.my.cnf文件要對應,

    cat /data/monitor/mysqld_exporter/scripts/ba_master_10.8.4.126_3306_15049.sh

    nohup /data/monitor/mysqld_exporter/bin/mysqld_exporter -web.listen-address=':15049' -config.my-cnf=/data/monitor/mysqld_exporter/.my.cnf/.ba_master_10.8.4.126_3306_15049.cnf -collect.info_schema.tables=false >> /data/monitor/mysqld_exporter/log/15049_10.8.4.126_3306.log 2>&1 &

  6. 由於/data/monitor/mysqld_exporter/scripts/下有很多個mysql db 的mysqld_exporter啓動腳本,所以我們cd /data/monitor/mysqld_exporter下,然後 sh start.sh進行啓動,然後檢查各個端口是否已監聽。


2. 配置prometheus

    A. 將mysqld_exporter的配置增加到prometheus.yml文件中,vim /data/monitor/prometheus/conf/prometheus.yml

        

global:

  # Server端抓取數據的時間間隔

  scrape_interval:     1m

  # 評估報警規則的時間間隔

  evaluation_interval: 1m

  # 數據抓取的超時時間

  scrape_timeout: 20s

  # 加全局標籤

  #external_labels:

  #  monitor: "hk"


# 連接alertmanager

alerting:

  alertmanagers:

    - static_configs:

      - targets: ["localhost:9093"]


# 告警規則

rule_files:

  - /data/monitor/prometheus/conf/rule/*.yml


# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# 監控prometheus本機

  - job_name: 'prometheus'

    scrape_interval: 15s

    static_configs:

      - targets: ['10.8.53.218:9090']


# 監控指定主機

  - job_name: 'node_resources'

    scrape_interval: 1m

    static_configs:

    file_sd_configs:

      - files:

        - /data/monitor/prometheus/conf/node_conf/node_host_info.json

    honor_labels: true


# mysql採集器

  - job_name: 'mysql_global_status'

    scrape_interval: 60s

    static_configs:

    file_sd_configs:

      - files:

        - /data/monitor/prometheus/conf/node_conf/node_mysql_info.json


    B. 編寫node_mysql_info.json,cat /data/monitor/prometheus/conf/node_conf/node_mysql_info.json

[

    {   

        "labels": { 

            "desc": "slave_customer_10.8.31.101:3306",

            "group": "ba",

            "mysql_addr": "10.8.31.101:3306",

            "role": "slave_customer"

        },

        "targets": [

            "localhost:15050"

        ]

    },

    {   

        "labels": { 

            "desc": "slave_bi_10.8.150.188:3306",

            "group": "ba",

            "mysql_addr": "10.8.150.188:3306",

            "role": "slave_bi"

        },

        "targets": [

            "localhost:15221"

        ]

    },

    {

        "labels": {

            "desc": "slave_10.8.139.209:3306",

            "group": "ba",

            "mysql_addr": "10.8.139.209:3306",

            "role": "slave"

        },

        "targets": [

            "localhost:15052"

        ]

    },

    {

        "labels": {

            "desc": "slave_catalog_10.8.11.246:3306",

            "group": "ba",

            "mysql_addr": "10.8.11.246:3306",

            "role": "slave_catalog"

        },

        "targets": [

            "localhost:15053"

        ]

    },

    {

        "labels": {

            "desc": "master_10.8.4.126:3306",

            "group": "ba",

            "mysql_addr": "10.8.4.126:3306",

            "role": "master"

        },

        "targets": [

            "localhost:15049"

        ]

    },

    {

        "labels": {

            "desc": "slave_dc_10.8.17.124:3306",

            "group": "ba",

            "mysql_addr": "10.8.17.124:3306",

            "role": "slave_dc"

        },

        "targets": [

            "localhost:15051"

        ]

    },

    {

        "labels": {

            "desc": "master_10.8.115.3:3306",

            "group": "openapi",

            "mysql_addr": "10.8.115.3:3306",

            "role": "master"

        },

        "targets": [

            "localhost:15060"

        ]

    }

]


    B. 重啓prometheus,cd /data/monitor/prometheus下,然後 sh reload.sh


注意:由於有很多指標無法抓取,我們用腳本再次獲取,我只有ucloud的api對接抓取的python腳本,如果有需要可以加我qq: 761117826


3. 配置grafana

    A. 下載mysql監控模板,下載地址: https://pan.baidu.com/s/1xWWceAQ_A4kKEn06dUlRBA 

    B. 如何導入請參考配置主機監控的文章中的2.配置grafana中的h至l步驟( https://blog.51cto.com/liuqs/2391282 )

4. 配置alertmanager

    A. 在prometheus配置規則,cat /data/monitor/prometheus/conf/rule/mysql.yml ,下面是文件內容,然後重啓prometheus,cd /data/monitor/prometheus && sh reload.sh


groups:

  - name: mysql_alert

    rules:

### 慢查詢 ###

# 默認慢查詢告警策略

    - alert: mysql慢查詢5分鐘100條

      expr: floor(delta(mysql_global_status_slow_queries{mysql_addr!~"10.8.6.44:3306|10.8.9.20:3306|10.8.12.212:3306"}[5m])) >= 100

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}條],告警初始時長爲3分鐘."


### qps ###

# 默認qps告警策略

    - alert: mysql_qps大於8000

      expr: floor(sum(irate(mysql_global_status_commands_total{group!~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 8000

      for: 6m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."


# 商品庫等qps告警策略

    - alert: mysql_qps大於25000

      expr: floor(sum(irate(mysql_global_status_commands_total{group=~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 25000

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲3分鐘."


### 內存 ###

# 默認內存告警策略

    - alert: mysql內存99%

      expr: mysql_mem_used_rate >= 99

      for: 6m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲6分鐘."


### 磁盤 ###

# 默認磁盤告警策略

    - alert: mysql磁盤85%

      expr: mysql_disk_used_rate{mysql_addr!~"10.8.161.53:3306|10.8.115.31:3306"} >= 85

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."


# 磁盤95%告警策略

    - alert: mysql磁盤95%

      expr: mysql_disk_used_rate{mysql_addr=~"10.8.161.53:3306|10.8.115.31:3306"} >= 95

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."


#### IO上限告警 ###

## SSD盤IO上限告警策略

#    - alert: mysqlSSD盤IO上限預警

#      expr: (floor(mysql_ioops) >= mysql_disk_total_size * 50 * 0.9) and (mysql_ssd == 1) and on() hour() >= 0 < 16

#      for: 6m

#      labels:

#        severity: warning

#      annotations:

#        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."

#

## 普通盤IO上限告警策略

#    - alert: mysql普通盤IO上限預警

#      expr: (floor(mysql_ioops) >= mysql_disk_total_size * 10 * 0.9) and (mysql_ssd == 0) and on() hour() >= 0 < 16

#      for: 6m

#      labels:

#        severity: warning

#      annotations:

#        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."


### 連接數 ###

# 默認連接數告警策略

    - alert: mysql連接數80%

      expr: floor(mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100) >= 80

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."


### 運行進程數 ###

# 默認運行進程數告警策略

    - alert: mysql運行進程數5分鐘增長>150

      expr: floor(delta(mysql_global_status_threads_running{mysql_addr!~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲3分鐘."


# 6分鐘運行進程數告警策略

    - alert: mysql運行進程數5分鐘增長>150

      expr: floor(delta(mysql_global_status_threads_runningi{mysql_addr=~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150

      for: 6m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."


### 主從同步異常 ###

# 默認主從同步告警策略

    - alert: mysql主從同步異常

      expr: (mysql_slave_status_slave_io_running{role!="master"} == 0) or (mysql_slave_status_slave_sql_running{role!="master"} == 0)

      for: 1m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],主從同步異常,告警初始時長爲1分鐘."


### 主從同步延時 ###

# 默認主從同步延時告警策略

    - alert: mysql主從同步延時>30s

      expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr!~"10.8.137.173:3306|10.8.11.17:3306|10.8.2.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 30

      for: 3m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}s],告警初始時長爲3分鐘."


# 主從同步延時較大告警策略

    - alert: mysql主從同步延時>300s

      expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr=~"10.8.137.173:3306|10.8.11.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 300

      for: 12m

      labels:

        severity: warning

      annotations:

        description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}s],告警初始時長爲12分鐘."

    

    B. 配置alertmanager, cat /data/prometheus/alertmanager/conf/alertmanager.yml ,如果是相同的接收人,可以直接在原來的資源後面增加,如果是不同的接收人,就需要重新定義接收人模板,然後再定義資源規則並綁定到新的接收人模板


global:

  resolve_timeout: 2m

  smtp_auth_password: q5AYahvxi3WLDap3 #發送郵箱密碼

  smtp_auth_username: [email protected] #發送郵箱

  smtp_from: [email protected] #發送郵箱

  smtp_require_tls: false

  smtp_smarthost: smtp.163.com:465 #發送服務器

  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ #微信接口鏈接


inhibit_rules:

- equal:

  - instance

  source_match:

    alertname: "主機CPU90%"

  target_match:

    alertname: "主機負載過高"

- equal:

  - instance

  source_match:

    alertname: "mysql運行進程數5分鐘增長數>150"

  target_match:

    alertname: "mysql慢查詢5分鐘100條"

- equal:

  - instance

  source_match:

    severity: error

  target_match:

    severity: warning

- equal:

  - instance

  source_match:

    severity: fatal

  target_match:

    severity: error

- equal:

  - service_name

  source_match:

    severity: error

  target_match:

    severity: warning


receivers: 

- email_configs: #定義test發送人模塊

  - html: '{{  template "email.default.html" . }}' #調用的模板

    send_resolved: true

    to: [email protected] #將報警信息發給些郵箱,多人用|

  name: test #發送人模板名

  wechat_configs: #微信接收這些信息請看最下面的企業微信介紹

  - agent_id: 1000002 #應用id

    api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa #應用認證

    corp_id: wwd397231fa801beaa #企業微信ID

    send_resolved: true

    to_user: LiuQingShan|liuqs #發送給企業微信通訊人的Id 多個人就用|分開


- email_configs: #定義默認的發送人

  - html: '{{  template "email.default.html" . }}'

    send_resolved: true

    to: [email protected]

  name: default_group

  wechat_configs: 

  - agent_id: 1000002

    api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa

    corp_id: wwd397231fa801beaa

    send_resolved: true

    to_user: LiuQingShan


route: #定義資源報警規則

  group_by:

  - monitor

  group_interval: 2m

  group_wait: 30s

  receiver: default_group

  repeat_interval: 6h

  routes:

  - continue: true

    match_re:

      instance: 10.8.46.117:9100|10.8.80.126:9100|10.8.32.67:9100|10.8.9.35:9100|10.8.69.81:9100|localhost:15050|localhost:15221|localhost:15052|localhost:15053|localhost:15049|localhost:15051|localhost:15060  #定義使用的資源

    receiver: test #使用test發送人模板


templates:

- /data/monitor/alertmanager/template/*.tmpl #調用報警內容模板的路徑



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章