promehtues alertmanager exporter 配置

原創

是夜静无声

2020-07-05 09:37

此章節講述prometheus的配置及使用，安裝步驟及資源請查看安裝章節

1.exporter的使用

大部分社區開源的exportwe解壓都可以直接使用，通過–help即可知道如何使用，以下列舉幾個比較特殊的exporter

（1）kafka-exporter
在github上下載的1.2的二進制包安裝，能置入zookeeper的地址的，所以如果你使用的kafka的consumerGroup相關的數據是在zookeeper中存放的（老版本），那麼是不能直接使用的否則獲取不到consumerGroup相關的指標數據，若都在kafka的server中存儲即可直接使用。因爲最新源碼是沒提供二進制安裝包，所以需要將最新代碼拉到本地自己編譯打包，步驟如下：

注：這裏需要有go的開發環境
首先將目錄切到go工作目錄的src下
# mkdir -p github.com/danielqsj/kafka_exporter（這步不能省略，make編譯時是根據此路徑編譯的）
# git clone https://github.com/danielqsj/kafka_exporter.git
# make

此時在同級目錄下就會出現可執行的kafka_exporter,另外在啓動之前首先查看kafka版本號，啓動exporter時會用到

# nohup ./kafka_exporter --kafka.server=127.0.0.1:9092 --log.leverl="info" --kafka.version="0.9.0.1" --use.consumelag.zookeeper --zookeeper.server=10.4.201.165:2181 &

不想編譯的小夥伴可以直接下載我已經編譯好的，下載地址請查看安裝章節

（2）nginx-vts-exporter
nginx的exporter是要依賴於nginx的vts模塊，所以在使用該exporter時，需先將vts模塊編譯到nginx中，編譯步驟如下：

# tar -zxvf nginx-1.12.2.tar.gz
# cd nginx-1.12.2
# tar -zxvf nginx-module-vts-0.1.12.tar.gz -C ./module
# tar -zxvf nginx_upstream_check_module.tar.gz -C ./module
# ./configure --prefix=../nginx --with-http_ssl_module --with-http_stub_status_module --with-pcre --with-stram --add-module=./module/nginx_upstream_check_module --add-module=./module/nginx-module-vts-0.1.12

make && make install

若編譯報錯找不到pcre庫，先安裝pcre庫，再重複如上步驟
然後將現有sbin下的nginx替換掉（先備份）
另外如果你剛好使用的nginx1.12.2的nginx可直接下載，下載地址查看安裝章節

啓動nginx-vts-exporter
# nohup ./nginx-vts-exporter -nginx.scrape_uri=http://127.0.0.1:8001/status/format/json &

(3)reidis-exproter

nohup ./redis_exporter -redis.addr 10.4.201.205:6679 &
注：redis集羣的監控有點不太一樣，只需要在集羣中一個節點部署redis_exporter即可，其餘節點的指標數據獲取是在prometheus中配置（詳細見下文prometheus配置）

其他exporter網上教程都有，這裏就不一一贅述

(4)prometheus端配置：

# vi prometheus.yml
global:
  scrape_interval:     15s 
  evaluation_interval: 15s 
#配置alertmanager  
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 192.168.177.150:9093

#告警規則配置文件
rule_files:
  - "/home/prometheus-2.13.1/alert_rule.yml"

#配置各個exporter
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
      labels:
          instance: prometheus

  - job_name: kafka
    static_configs:
      - targets: ['192.168.177.148:9308']
      
  - job_name: redis 
    static_configs:
      - targets:
        - redis://192.168.50.148:6679
        - redis://192.168.50.149:6679
        - redis://192.168.50.150:6679
    metics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label:__param_target
      - source_label: [__param_target]
        target_label:instance
      - target_label:__address__
        replacement:192.168.50.150:9121

2.告警相關配置

(1)prometheus下rule配置（網上有rule配置語法詳解，我這裏就不贅述了，只舉個小例子）

groups:
- name: example   #報警規則的名字
  rules:

  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown     #檢測job的狀態，持續1分鐘metrices不能訪問會發給altermanager進行報警
    expr: up == 0
    for: 1m    #持續時間
    labels:
      serverity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."


  - alert: "it's has problem"  #報警的名字
    expr: "test_tomcat{exported_instance="uat",exported_job="uat-app-status",host="test",instance="uat",job="uat-apps-status"} -  test_tomcat{exported_instance="uat",exported_job="uat-app-status",host="test",instance="uat",job="uat-apps-status"} offset 1w > 5"   # 這個意思是監控該表達式查詢出來的值與一週前的值進行比較，大於5且持續10m鍾就發送給altermanager進行報警
    for: 1m  #持續時間
    labels:
      serverity: warning
    annotations:
      summary: "{{ $labels.type }}趨勢增高"
      description: "機器:{{ $labels.host }} tomcat_id:{{ $labels.id }} 類型:{{ $labels.type }} 與一週前的差值大於5,當前的差值爲:{{ $value }}"    #自定義的報警內容

(2)alertmanager配置（關於抑制、分組、沉默這些配置還沒做，後續做了再續）

global:
  smtp_smarthost: 'smtphz.qiye.163.com:25' #這裏需要注意如果使用的是163的郵箱，這裏得打客服查詢你對應的地區的163smtp通信網關，我這裏是華北的，所以用的是 "smtphz"  ,網易這個很坑
  smtp_from: 'xxxxxxx.com.cn'
  smtp_auth_username: 'xxxxxxx.com.cn'
  smtp_auth_password: 'xxxxxx'
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  email_configs:
  - to: 'xxxxx.com.cn'
    html: '{{ template "email.test.html" . }}'
    headers: { Subject: "[WARN] 報警郵件test" }
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
templates:
- '/home/alertmanager-0.19.0/test.tmpl'

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

promehtues alertmanager exporter 配置

此章節講述prometheus的配置及使用，安裝步驟及資源請查看安裝章節

1.exporter的使用

2.告警相關配置

藍橋15屆stema編程題密碼鎖-動態規劃 C++和Python最後一道題

2021看雪SDC議題回顧 | SaTC：一種全新的物聯網設備漏洞自動化挖掘方法

C# 代碼學習

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

HTTP URL 詳解

得物 ZooKeeper SLA 也可以 99.99%

創新工具：2024年開發者必備的一款表格控件（二）

車牌識別控制檯可快速整合二次開發

promehtues alertmanager exporter 配置

prometheus相關離線安裝

基於docker容器搭建redis高可用集羣

Oracle中常用的管理SQL整合

WARN Connection to node -1 could not be established. Broker may not be available

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結