Prometheus 使用阿里雲郵件推送發送告警郵件

原創

2020-06-26 13:10

我們在文章《使用 docker-compose 搭建 prometheus 監控系統》的基礎上，增加 prometheus 的告警功能。

Prometheus 指標的收集存儲與告警是分開的，告警功能由 alertmanager 提供。我們需要在 prometheus 定義告警規則，這些規則可以觸發事件，然後傳播到 alertmanager。接下來，alertmanager 會決定如何處理相應的警報，並確定使用電子郵件，短信等發出警報。Prometheus 和 alertmanager 的關係如下圖所示。

爲了在 prometheus 增加 alertmanager 的使用，我們在 docker-compose.yml 增加 alertmanager 容器：

version: '3'
services:
  centos1:
    image: centos
    container_name: centos1
    restart: always
    ports:
      - "9101:9100"
    volumes:
      - ~/code/docker/prometheus/node_exporter:/root
    command: /root/node_exporter

  centos2:
    image: centos
    container_name: centos2
    restart: always
    ports:
      - "9102:9100"
    volumes:
      - ~/code/docker/prometheus/node_exporter:/root
    command: /root/node_exporter

  prometheus:
    image: prom/prometheus
    container_name: prometheus
    restart: always
    ports:
      - "9090:9090"
    volumes:
      - ~/code/docker/prometheus/prometheus:/etc/prometheus
      - ~/code/docker/prometheus/prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

  grafana:
    image: grafana/grafana
    container_name: grafana
    restart: always
    ports:
      - "3000:3000"
    volumes:
      - ~/code/docker/prometheus/grafana_data:/var/lib/grafana

  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    restart: always
    ports:
      - "9093:9093"
    volumes:
      - ~/code/docker/prometheus/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'

在 docker-compose.yml 文件中，我們指定 alertmanager 的配置文件是 /etc/alertmanager/alertmanager.yml，有關 alertmanager.yml 的配置，我們下面再說明。

我們先來看 prometheus 的配置 prometheus.yml：

global:
  scrape_interval: 5s

rule_files:
  - "alert.rules"

alerting:
  alertmanagers:
    - static_configs:
      - targets:
        - 'alertmanager:9093'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'linux-exporter'
    metrics_path: /metrics
    static_configs:
      - targets: ['centos1:9100', 'centos2:9100']

alerting 用於指定 alertmanager 的信息，alertmanager 啓動端口爲 9093，alertmanager 容器啓動信息可以在 docker-compose.yml 文件中查看到。

rule_files 用於指定告警規則，這裏我們指定 alert.rules 文件來用保存告警規則。

alert.rules 文件內容如下：

groups:
- name: example
  rules:
  # Alert for any instance that is unreachable for > 1 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: page

    annotations:
      summary: "實例 {{ $labels.instance }} 宕機"
      description: "{{ $labels.instance }} 任務 {{ $labels.job }} 已宕機 1 分鐘"

這個告警的規則是指當實例宕機超過 1 分鐘時觸發告警。

接下來我們來看 alertmanager 的配置文件 alertmanager.yml 的內容：

global:
  resolve_timeout: 2m
  smtp_smarthost: 'smtpdm.aliyun.com:465'
  smtp_from: '[email protected]' 
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'xxxxxx'  # 郵箱授權碼，注意不是郵箱密碼
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 4m      # 發送重複警報的週期 
  receiver: 'mail'         # 發送警報的接收者的名稱，以下 receivers name 的名稱

receivers:
- name: 'mail'             # 警報接收者，與上面的配置相對應    
  email_configs:
  - to: '[email protected]'     # 接收警報的email
    headers: { Subject: "[WARN] alertmanager 報警郵件"}

其中，有關阿里雲 SMTP 的設置需要先在阿里雲郵件推送控制檯提前配置好。
在阿里雲郵件推送的控制檯，根據文檔提示，創建發信域名，創建發信地址，並設置 SMTP 授權碼：

啓動 docker-compose：

docker-compose up

在另一個終端關閉 centos1 容器：

docker-compose stop centos1

等待一段時間，打開 prometheus 的 alerts 頁面（http://127.0.0.1:9090/alerts），可以看到告警提示：

再打開 alertmanager 頁面（http://127.0.0.1:9093/#/alerts），可以看到 alertmanage 已正常接收到 prometheus 的告警：

登錄接收告警郵件的郵箱，可以看到 alertmanager 發送的警報郵件：

參考資料

https://github.com/prometheus/alertmanager
Prometheus 監控實戰，James Turnbull 著，史天等譯，機械工業出版社
https://blog.csdn.net/wshl1234567/article/details/100107167
https://github.com/danguita/prometheus-monitoring-stack/blob/master/config/alertmanager.template.yml
https://blog.csdn.net/aixiaoyang168/article/details/98474494
https://blog.csdn.net/lihao21/article/details/104349219
https://prometheus.io/docs/alerting/alertmanager/

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Prometheus 使用阿里雲郵件推送發送告警郵件

參考資料

SQL優化-20231016

Kafka 生產者和消費者學習筆記

使用 prometheus python 庫編寫自定義指標

一文帶你瞭解 RTO 和 RPO

在多線程環境中使用Jedis

POJ 1401 Factorial 解題報告

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結