Prometheus實戰

原創

苦逼的老王

2020-07-08 06:12

Prometheus 是一個監控系統，由很多組件組成。

Prometheus：負責向*_exporter抓取數據,配置報警規則，通知Alertmanager發送告警信息。
Alertmanager ：實現報警功能。接受Prometheus發送的信息，以email或者http報警通知。
node：獲取本地計算機，或者mysql，（生產監控信息提供者，給 Prometheus組件拉取）。 redis_exporter
拉取redis的信息 node_exporter 拉取 linux主機的信息，cup、內存、硬盤、網絡。

工作流程：

部署redis_exporter 或者 node_exporter 在需要監控的服務器上，
在Prometheus 中配置 redis_exporter 或者 node_exporter的地址定時拉取。
Prometheus 按照告警規則發送信息至 Alertmanager
Alertmanager發送email或者http告警通知。

監控linux實例

準備工作

node_exporter 沒有windows版本。
Alertmanager和prometheus這裏安裝windows版本的，linux版本的和windos版本的只是啓動命令不同關鍵在於配置文件。

第一步

在需要監控的linux服務器上安裝node_exporter

1. 下載檢控服務的工具node_exporter-1.0.1.linux-amd64.tar.gz
2. 解壓node_exporter tar -xvzf node_exporter-1.0.1.linux-amd64.tar.gz
3. 啓動nohup /usr/loacl/node_exporter/node_exporter &，如果提示ignoring iuputand appending out.out  按回車鍵
4. 驗證是否啓動 訪問：curl http://IP:9100/metrics

第二步

# smtp_from、smtp_auth_username 配置爲郵箱地址
#smtp_from 發件人郵箱
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:587'
  smtp_from: ''
  smtp_auth_username: '接受郵箱地址'
  smtp_auth_password: '郵箱授權'
  smtp_require_tls: false
  
#group_wait 組告警等待時間。也就是告警產生後等待10s，如果有同組告警一起發出
#group_interval 兩組告警的間隔時間
#repeat_interval 重複告警的間隔時間，減少相同郵件的發送頻率  
route:
#  group_by: ['alertname']
  group_by: [alertname]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '[email protected]' 
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

第三步

安裝 prometheus，可在任意服務器安裝，這裏安裝windows版本的，linux版本的和windos版本的只是啓動命令不同關鍵在於配置文件。

修改prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# 報警組件地址配置
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 10.10.15.30:9093

# 引入配置文件
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/node.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

# 配置爲 node-exporter 所在的服務地址
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['172.18.0.255:20004']
     
# 配置爲 alertmanager 所在的服務地址
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['localhost:9093']

在prometheus目錄下創建rules文件夾並創建node.yml

設置報警規則

# for 是說當監控第一次發現某個指標超過閥值，等待for秒後，再檢查一次，如果還是超過閥值在發送給alertmanager
# severity 指定警告的告警級別
  
groups:
  - name: node_status
    rules:
      - alert: "CPU Alert"
        expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 0.1
        for: 30s
        labels:
          severity: warning
        annotations:
          summary: " {{ $labels.instance }} CPU使用率>{{ $value }}%"
          
      - alert: "MEMORY Alert"
        expr: ((1-(node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100)>1
        for: 30s
        labels:
          severity: warning
        annotations:
          summary: " {{ $labels.instance }} 內存使用率>{{ $value }}%"
          
      - alert: "DISK Alert"
        expr: ((1 - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"}) * 100)>1
        for: 30s
         labels:
          severity: warning
        annotations:
            summary: " {{ $labels.instance }} 磁盤使用率>{{ $value }}%"

第四步

啓動

啓動node_exporter nohup ./node_exporter &
雙擊Alertmanager 下的 alertmanager.exe 啓動Alertmanager
雙擊prometheus 下的 prometheus.exe 啓動 prometheus

如果需要監控redis並告警只需要下載redis_exporter
並在prometheus 下的 prometheus.yml配置 redis_exporter的job 地址在prometheus的rules文件夾下的node.yml 中添加redis的報警規則

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Prometheus實戰

工作流程：

監控linux實例

準備工作

第一步

第二步

第三步

設置報警規則

第四步

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

令人頭痛的java.lang.OutOfMemoryError：GC overhead limit exceeded

Prometheus實戰

linux Gson轉換時間格式錯誤

Lodop一個界面不間斷打出多張空白或界面的一小部分

Freemark實現word 、excel 模板導出

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結