1、部署準備
說明:所有的容器組都運行在monitoring 命名空間
本文參考https://github.com/coreos/kube-prometheus
由於官方維護的版本在現有的部署環境出現問題所以下面做了一些修改及變更不影響整體效果
Alertmanager 項目使用官方yaml 不做任何修改
2、Alertmanager 相關服務的yaml 準備
2.1、下載官方yaml
mkdir kube-prometheus
cd kube-prometheus
git clone https://github.com/coreos/kube-prometheus
cd kube-prometheus/manifests
mkdir prometheus-alertmanager
mv alertmanager* prometheus-alertmanager
2.2、創建 Alertmanager 服務
cd prometheus-alertmanager
kubectl apply -f .
2.3、 查看alertmanager 狀態
[root@jenkins prometheus-alertmanager]# kubectl get pod -n monitoring -o wide | grep alertmanager
alertmanager-main-0 2/2 Running 0 36d 10.65.1.136 node02 <none> <none>
alertmanager-main-1 2/2 Running 0 26d 10.65.4.246 node03 <none> <none>
alertmanager-main-2 2/2 Running 0 36d 10.65.0.53 node01 <none> <none>
http://10.65.1.136:9093/#/alerts
http://10.65.4.246:9093/#/alerts
http://10.65.0.53:9093/#/alerts
可以分別打開alertmanager web頁
[root@jenkins prometheus-alertmanager]# kubectl get service -n monitoring -o wide | grep alertmanager
alertmanager-main ClusterIP 10.64.215.237 <none> 9093/TCP 43d alertmanager=main,app=alertmanager
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 36d app=alertmanager
http://10.64.215.237:9093/#/alerts
3、配置 alertmanager webhook 地址 例子
prometheus alertmanager 支持配置自動發現和更新
因此,我們只需要重新生成配置即可 首先,刪除原有的配置項
kubectl delete secret alertmanager-main -n monitoring
編寫一個 webhook 配置文件,命名爲 alertmanager.yaml
報警項目參考https://github.com/qist/msg-senderglobal: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'webhook' receivers: - name: 'webhook' webhook_configs: - url: 'http://msg-sender.monitoring:4000/sender/wechat'
注意,這裏的 url 要跟 msg-sender 提供的服務地址對應上
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
確認下 alertmanager 的配置項是否正確更新了
Config
global:
resolve_timeout: 5m
http_config: {}
smtp_hello: localhost
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
hipchat_api_url: https://api.hipchat.com/
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: webhook
group_by:
- alertname
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: webhook
webhook_configs:
- send_resolved: true
http_config:{}
url: http://msg-sender.monitoring:4000/sender/wechat
templates: []
然後,查看 msg-sender 的容器日誌,可以看到已經收到了來自 alertmanager 的 webhook 告警
而且已經模擬了wechat 的發送動作!
tail -n 10 msg-sender2019-06-19.log
INFO: 2019/06/19 09:29:02 http.go:238: {"errcode":0,"errmsg":"ok","invaliduser":""}
INFO: 2019/06/19 09:29:02 http.go:231: #sendWechat# client:1.8.17.209:41088, to:huangdaquan, requestType:application/x-www-form-urlencoded, content:2019-06-19 09:29:01 platform bulletin is not available!
下一篇: Kubernetes 生產環境安裝部署 基於 Kubernetes v1.14.0 之 prometheus與grafana 部署