Helm部署Prometheus Operator和自定義監控

  • 安裝

建議將Prometheus Operator部署在專門的命名空間中,一般爲monitoring。

kubectl create namespace monitoring

Helm v3安裝時,在crds/目錄中的清單文件會自動提交給Kubernetes。

helm install prometheus stable/prometheus-operator \
  --namespace monitoring \
  --set prometheusOperator.createCustomResource=false \
  --set prometheusOperator.cleanupCustomResource=true

查看Kubernetes資源。

kubectl --namespace monitoring get all
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-prometheus-prometheus-oper-alertmanager-0   2/2     Running   0          4m20s
pod/prometheus-grafana-dc56bc899-vprqs                       2/2     Running   0          4m56s
pod/prometheus-kube-state-metrics-67b765f8b8-wblcd           1/1     Running   0          4m56s
pod/prometheus-prometheus-node-exporter-fxl6j                1/1     Running   0          4m56s
pod/prometheus-prometheus-node-exporter-r8vhc                1/1     Running   0          4m56s
pod/prometheus-prometheus-node-exporter-xcgkj                1/1     Running   0          4m56s
pod/prometheus-prometheus-oper-operator-58566dd678-5c2zm     2/2     Running   0          4m56s
pod/prometheus-prometheus-prometheus-oper-prometheus-0       3/3     Running   1          4m9s

NAME                                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   4m20s
service/prometheus-grafana                        ClusterIP   10.1.45.41     <none>        80/TCP                       4m56s
service/prometheus-kube-state-metrics             ClusterIP   10.1.35.41     <none>        8080/TCP                     4m56s
service/prometheus-operated                       ClusterIP   None           <none>        9090/TCP                     4m9s
service/prometheus-prometheus-node-exporter       ClusterIP   10.1.206.118   <none>        9100/TCP                     4m56s
service/prometheus-prometheus-oper-alertmanager   ClusterIP   10.1.248.72    <none>        9093/TCP                     4m56s
service/prometheus-prometheus-oper-operator       ClusterIP   10.1.170.8     <none>        8080/TCP,443/TCP             4m56s
service/prometheus-prometheus-oper-prometheus     ClusterIP   10.1.132.191   <none>        9090/TCP                     4m56s

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   3         3         3       3            3           <none>          4m56s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           4m56s
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           4m56s
deployment.apps/prometheus-prometheus-oper-operator   1/1     1            1           4m56s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-dc56bc899                     1         1         1       4m56s
replicaset.apps/prometheus-kube-state-metrics-67b765f8b8         1         1         1       4m56s
replicaset.apps/prometheus-prometheus-oper-operator-58566dd678   1         1         1       4m56s

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-prometheus-oper-alertmanager   1/1     4m20s
statefulset.apps/prometheus-prometheus-prometheus-oper-prometheus       1/1     4m9s

查看創建的CRDs。

kubectl get crd | grep coreos
alertmanagers.monitoring.coreos.com     2020-04-01T01:42:58Z
podmonitors.monitoring.coreos.com       2020-04-01T01:42:58Z
prometheuses.monitoring.coreos.com      2020-04-01T01:42:58Z
prometheusrules.monitoring.coreos.com   2020-04-01T01:42:58Z
servicemonitors.monitoring.coreos.com   2020-04-01T01:42:59Z

爲了在集羣外訪問Prometheus、Grafana和Alertmanager,我們安裝開源網關Ambassador,通過該網關訪問。後面我們配置Prometheus監控該網關。

添加Helm倉庫,選擇datawire/ambassador Chart。

helm repo add datawire https://www.getambassador.io
helm search repo ambassador
NAME                            CHART VERSION   APP VERSION     DESCRIPTION
aliyuncs/ambassador             4.4.7           0.85.0          A Helm chart for Datawire Ambassador
datawire/ambassador             6.2.2           1.3.1           A Helm chart for Datawire Ambassador
datawire/ambassador-operator    0.1.0           1.0.0           A Helm chart for Kubernetes
stable/ambassador               5.3.0           0.86.1          A Helm chart for Datawire Ambassador

安裝Ambassador Edge Stack。

kubectl create namespace ambassador

helm install ambassador datawire/ambassador \
  --namespace ambassador \
  --set authService.create=false \
  --set crds.create=false \
  --set licenseKey.licenseKey=false \
  --set rateLimit.create=false \
  --set service.type=NodePort

如果如Ambassador系列-11-Helm安裝Ambassador Edge Stack 1.1.0一文中提到的方式申請了License,可以在Helm命令行註冊該License,licenseKey.value就是郵件中收到的License Key。

helm install ambassador datawire/ambassador \
  --namespace ambassador \
  --set authService.create=false \
  --set crds.create=false \
  --set licenseKey.secretName=ambassador-edge-stack \
  --set licenseKey.value=eyJhbGciOiJQUzUxXXXXXXXXXXXXXXXXXXXX.eyJsaWNlbnNlX2tleV92ZXJzaW9uIjoidjIiLCJjdXN0b21lcl9pZCI6InR3aW5nYW9Ac2luYS5jbiIsImN1c3RvbWVyX2VtYWlsIjoidHdpbmdhb0BzaW5hLmNuIiwiZW5hYmxlZF9mZWF0dXJlcyI6WyIiLCJmaWx0ZXIiLCJyYXRlbGltaXQiLCJ0cmFmZmljIiwiZGV2cG9ydGFsIl0sImVuZm9yY2VkX2xpbWl0cyI6W3sibCI6ImRldnBvcnRhbC1zZXJ2aWNlcyIsInYiOjV9LHsibCI6InJhdGVsaW1pdC1zZXJ2aWNlIiwidiI6NX0seyJsIjoiYXV0aGZpbHRlci1zZXJ2aWNlIiwidiI6NX1dLCJtZXRhZGF0YSI6e30sImV4cCI6MTYxMjUzMDk4NCwiaWF0IjoxNTgwOTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.TKBPqYGitumz5iSbFQq8EN9KN_BAqCJs9x03K6W3WBJxUx4fp3Qc6whjc9lNZgNG6KfUh61DJ9dru8G-90SyjCxvz05QDvAUtL__7PYfTS-17Jq0ZJygOAC8hGtrOz8iCw--oFkAhpZ14mvc0-CpZEn0DgKAHel0WQY7nYGQ6aEh2GYQG80rf3KBSxZwbp-sawBANArwvCvWw1W_5tSpBy3FBG33J0IIb2rS9lAuFr0ZvVdocJr5vIKb1KQAH3Ww9sxLKfFdFOLN_5fUIsFiAOYiPuo0hpQp1BbIllxCYrKAMig3xKRIlJI7Z6C-YySSxBXXXXXXXXXXXXXXXXXXXX \
  --set rateLimit.create=false \
  --set service.type=NodePort

查看Kubernetes資源。

kubectl get all -nambassador
NAME                                    READY   STATUS    RESTARTS   AGE
pod/ambassador-75b5688649-7pnl2         0/1     Running   1          83s
pod/ambassador-75b5688649-jpvpv         0/1     Running   1          83s
pod/ambassador-75b5688649-llkn2         0/1     Running   1          83s
pod/ambassador-redis-8556cbb4c6-ssbt6   1/1     Running   0          83s

NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
service/ambassador         NodePort    10.1.110.49    <none>        80:38024/TCP,443:32271/TCP   83s
service/ambassador-admin   ClusterIP   10.1.166.205   <none>        8877/TCP                     83s
service/ambassador-redis   ClusterIP   10.1.193.90    <none>        6379/TCP                     83s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ambassador         0/3     3            0           83s
deployment.apps/ambassador-redis   1/1     1            1           83s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/ambassador-75b5688649         3         3         0       83s
replicaset.apps/ambassador-redis-8556cbb4c6   1         1         1       83s

通過管理Service端口訪問metrics,其實就是envoy的metrics。

curl http://10.1.166.205:8877/metrics
# TYPE envoy_cluster_upstream_cx_connect_timeout counter
envoy_cluster_upstream_cx_connect_timeout{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_flow_control_paused_reading_total counter
envoy_cluster_upstream_flow_control_paused_reading_total{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_close_notify counter
envoy_cluster_upstream_cx_close_notify{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_lb_recalculate_zone_structures counter
envoy_cluster_lb_recalculate_zone_structures{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_flow_control_resumed_reading_total counter
envoy_cluster_upstream_flow_control_resumed_reading_total{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_rq_timeout counter
envoy_cluster_upstream_rq_timeout{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_connect_fail counter
envoy_cluster_upstream_cx_connect_fail{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_rq_cancelled counter
envoy_cluster_upstream_rq_cancelled{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_rx_bytes_total counter
envoy_cluster_upstream_cx_rx_bytes_total{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_overflow counter
envoy_cluster_upstream_cx_overflow{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_destroy_remote counter
envoy_cluster_upstream_cx_destroy_remote{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
# TYPE envoy_cluster_upstream_cx_http2_total counter
envoy_cluster_upstream_cx_http2_total{envoy_cluster_name="cluster_127_0_0_1_8877_ambassador"} 0
......

爲Prometheus、Grafana和Alertmanager配置Mapping。沒有找到Ambassador如何配置重定向後的URL重寫,只能按照域名配置路由,但又發現瀏覽器在host請求頭中竟然帶了端口號,只能在mapping的host配置中加上端口號,如prom.twingao.com:32271。只能將就,留待後續優化,或者有沒有人告訴我如何處理?

vi prometheus-mapping.yaml
---
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  name: prometheus-mapping
  namespace: ambassador
spec:
  host: prom.twingao.com:32271
  prefix: /
  service: prometheus-prometheus-oper-prometheus.monitoring:9090
---
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  name: grafana-mapping
  namespace: ambassador
spec:
  host: grafana.twingao.com:32271
  prefix: /
  service: prometheus-grafana.monitoring:80
---
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  name: alert-mapping
  namespace: ambassador
spec:
  host: alert.twingao.com:32271
  prefix: /
  service: prometheus-prometheus-oper-alertmanager.monitoring:9093

kubectl apply -f prometheus-mapping.yaml

在瀏覽器主機增加hosts,Windows系統在C:\Windows\System32\drivers\etc\hosts

# Prometheus Start
192.168.1.55 prom.twingao.com
192.168.1.55 grafana.twingao.com
192.168.1.55 alert.twingao.com
# Prometheus End

訪問Prometheus,並切換到Targets頁面,地址https://prom.twingao.com:32271/targets。沒有看出來monitoring/prometheus-prometheus-oper-kube-etcd和monitoring/prometheus-prometheus-oper-kube-proxy出問題的原因,但可以看出使用node地址+Service端口訪問,這是無法訪問的。

monitoring/prometheus-prometheus-oper-kubelet是通過https-metrics(10250)端口訪問的。monitoring/prometheus-prometheus-oper-kube-controller-manager和monitoring/prometheus-prometheus-oper-kube-scheduler已經配置正確。

在這裏插入圖片描述

訪問Grafana,缺省密碼爲prom-operator,獲取方式:

helm show values stable/prometheus-operator | grep adminPassword
  adminPassword: prom-operator

Grafana缺省內置了多個dashboard。

在這裏插入圖片描述

訪問Alertmanager,地址:https://alert.twingao.com:32271/#/alerts

在這裏插入圖片描述

  • 監控Ambassador

我們從service/ambassador-admin抓取metrics,查看一下ports。

kubectl get service/ambassador-admin -oyaml -nambassador
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2020-04-01T07:15:16Z"
  labels:
    app.kubernetes.io/instance: ambassador
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ambassador
    app.kubernetes.io/part-of: ambassador
    helm.sh/chart: ambassador-6.2.2
    product: aes
    service: ambassador-admin
  name: ambassador-admin
  namespace: ambassador
  resourceVersion: "43258"
  selfLink: /api/v1/namespaces/ambassador/services/ambassador-admin
  uid: 955b23af-c023-4196-a8f7-224194bac419
spec:
  clusterIP: 10.1.166.205
  ports:
  - name: ambassador-admin
    port: 8877
    protocol: TCP
    targetPort: admin
  selector:
    app.kubernetes.io/instance: ambassador
    app.kubernetes.io/name: ambassador
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

查看Prometheus自定義資源,其中定義了serviceMonitorSelector.matchLabels=release: prometheus,Prometheus據此關聯ServiceMonitor。

kubectl get prometheuses.monitoring.coreos.com/prometheus-prometheus-oper-prometheus -nmonitoring -oyaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  creationTimestamp: "2020-04-01T01:43:35Z"
  generation: 1
  labels:
    app: prometheus-operator-prometheus
    chart: prometheus-operator-8.5.0
    heritage: Helm
    release: prometheus
  name: prometheus-prometheus-oper-prometheus
  namespace: monitoring
  resourceVersion: "14492"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/prometheus-prometheus-oper-prometheus
  uid: 2b97c71a-5ec6-41bf-a42a-565136821ae5
spec:
  alerting:
    alertmanagers:
    - name: prometheus-prometheus-oper-alertmanager
      namespace: monitoring
      pathPrefix: /
      port: web
  baseImage: quay.io/prometheus/prometheus
  enableAdminAPI: false
  externalUrl: http://prometheus-prometheus-oper-prometheus.monitoring:9090
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: prometheus
  portName: web
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      app: prometheus-operator
      release: prometheus
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-prometheus-oper-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus
  version: v2.13.1

創建ServiceMonitor。其中幾個需要注意的關鍵點。

  • ServiceMonitor的name最終會反應到Prometheus的配置中,作爲job_name。
  • 由於Prometheus自定義資源中定義了serviceMonitorSelector.matchLabels=release: prometheus,表示ServiceMonitor需要定義一個標籤release: prometheus,Prometheus據此可以關聯ServiceMonitor。
  • ServiceMonitor的命名空間必須和Prometheus所在的命名空間相同,此處爲monitoring。
  • endpoints.port需要和Service中的拉取metrics的ports.name對應,此處和上面對應爲ambassador-admin。
  • namespaceSelector.matchNames需要和被監控的Service所在的命名空間相同,此處爲ambassador。
  • selector.matchLabels的標籤必須和被監控的Service中能唯一標明身份的標籤對應。

創建ambassador-admin服務對應的ServiceMonitor。

vi prometheus-serviceMonitorAmbassador.yaml
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ambassador-monitor
  labels:
    release: prometheus
  namespace: monitoring
spec:
  endpoints:
  - port: ambassador-admin
  namespaceSelector:
    matchNames:
    - ambassador
  selector:
    matchLabels:
      service: ambassador-admin

kubectl apply -f prometheus-serviceMonitorAmbassador.yaml

Prometheus的Targets。

在這裏插入圖片描述

Prometheus監控指標rate(envoy_http_rq_total[1m])

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章