Prometheus-Operator官方提供的架構圖:
Operator是最核心的部分,作爲一個控制器,他會去創建Prometheus、PodMonitor、ServiceMonitor、AlertManager以及PrometheusRule這5個CRD資源對象,然後會一直監控並維持這5個資源對象的狀態。
其中創建的Prometheus這種資源對象就是作爲Prometheus Server存在,而PodMonitor和ServiceMonitor就是exporter的各種抽象,是用來提供專門提供metrics數據接口的工具,Prometheus就是通過PodMonitor和ServiceMonitor提供的metrics數據接口去pull數據的,當然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規則文件。
這樣我們要在集羣中監控什麼數據,就變成了直接去操作 Kubernetes 集羣的資源對象了,是不是方便很多了。上圖中的 Service 和 ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 可以通過 labelSelector 的方式去匹配一類 Service,Prometheus 也可以通過 labelSelector 去匹配多個ServiceMonitor。
下載https://github.com/coreos/kube-prometheus,最好下載最新的Release版本。本文下載https://github.com/coreos/kube-prometheus/archive/v0.3.0.tar.gz。
mkdir prometheus
cd prometheus/
上傳kube-prometheus-0.3.0.tar.gz,開始安裝。
tar zxvf kube-prometheus-0.3.0.tar.gz
cd kube-prometheus-0.3.0/manifests/setup/
kubectl apply -f .
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
查看創建的CRDs。
kubectl get crd | grep coreos
alertmanagers.monitoring.coreos.com 2020-03-26T12:27:08Z
podmonitors.monitoring.coreos.com 2020-03-26T12:27:08Z
prometheuses.monitoring.coreos.com 2020-03-26T12:27:09Z
prometheusrules.monitoring.coreos.com 2020-03-26T12:27:10Z
servicemonitors.monitoring.coreos.com 2020-03-26T12:27:11Z
默認情況下,prometheus-serviceMonitorKubelet.yaml關聯的kubelet的https-metrics(10250)端口拉取metrics數據,爲了安全起見,kubelet關閉了匿名認證並開啓了webhook授權,訪問https-metrics端口需要認證和授權。http-metrics(10255)爲只讀端口,無須認證和授權。但默認情況下是不打開的,可以如下配置打開該端口。注意所有安裝了kubelet服務的集羣主機都需要配置。
vi /var/lib/kubelet/config.yaml
readOnlyPort: 10255
systemctl restart kubelet
然後將prometheus-serviceMonitorKubelet.yaml文件中的https-metrics更改成http-metrics即可。
cd ..
vi prometheus-serviceMonitorKubelet.yaml
port: https-metrics
scheme: https
#改爲
port: http-metrics
scheme: http
爲了Kubernetes集羣外訪問prometheus、grafana和alertmanager,可以配置NodePort或者Ingress,此處簡單起見,直接修改爲NodePort。
vi prometheus-service.yaml
type: NodePort
nodePort: 39090
vi grafana-service.yaml
type: NodePort
nodePort: 33000
vi alertmanager-service.yaml
type: NodePort
nodePort: 39093
prometheus-serviceMonitorKubeScheduler.yaml定義了監控kube-scheduler的ServiceMonitor。serviceMonitorKubeScheduler通過k8s-app=kube-scheduler和kube-scheduler服務關聯,但Kubernetes缺省沒有創建該service。可以先看一下對應的kube-scheduler-k8s-master這個pod,然後直接在manifests目錄下定義kube-scheduler服務,下面統一部署。10251是kube-scheduler服務metrics數據所在的端口。
kube-scheduler:調度器主要負責爲新創建的pod在集羣中尋找最合適的node,並將pod調度到Node上。
kubectl get pod kube-scheduler-k8s-master -nkube-system --show-labels
NAME READY STATUS RESTARTS AGE LABELS
kube-scheduler-k8s-master 1/1 Running 4 103d component=kube-scheduler,tier=control-plane
vi prometheus-kubeSchedulerService.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
selector:
component: kube-scheduler
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
同樣,prometheus-serviceMonitorKubeControllerManager.yaml定義了監控kube-controller-manager的ServiceMonitor。serviceMonitorKubeControllerManager通過k8s-app: kube-controller-manager和kube-controller-manager服務關聯,但Kubernetes缺省沒有創建該service。可以先看一下對應的kube-controller-manager-k8s-master這個pod,然後直接在manifests目錄下定義kube-controller-manager服務。10252是kube-controller-manager服務metrics數據所在的端口。
kube-controller-manager:管理控制中心負責集羣內的Node、Pod副本、服務端點(Endpoint)、命名空間(Namespace)、服務賬號(ServiceAccount)、資源定額(ResourceQuota)的管理,當某個Node意外宕機時,Controller Manager會及時發現並執行自動化修復流程,確保集羣始終處於預期的工作狀態。
kubectl get pod kube-controller-manager-k8s-master -nkube-system --show-labels
NAME READY STATUS RESTARTS AGE LABELS
kube-controller-manager-k8s-master 1/1 Running 4 103d component=kube-controller-manager,tier=control-plane
vi prometheus-kubeControllerManagerService.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
繼續安裝。
kubectl apply -f .
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
service/kube-controller-manager created
service/kube-scheduler created
查看Kubernetes資源,可以看出prometheus-k8s和alertmanager-main的控制器類型爲statefulset,
kubectl get all -nmonitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 4 12h
pod/alertmanager-main-1 2/2 Running 0 12h
pod/alertmanager-main-2 2/2 Running 6 12h
pod/grafana-58dc7468d7-d8bmt 1/1 Running 0 12h
pod/kube-state-metrics-78b46c84d8-wsvrb 3/3 Running 0 12h
pod/node-exporter-6m4kd 2/2 Running 0 12h
pod/node-exporter-bhxw2 2/2 Running 6 12h
pod/node-exporter-tkvq5 2/2 Running 0 12h
pod/prometheus-adapter-5cd5798d96-5ffb5 1/1 Running 0 12h
pod/prometheus-k8s-0 3/3 Running 10 12h
pod/prometheus-k8s-1 3/3 Running 1 12h
pod/prometheus-operator-99dccdc56-89l7n 1/1 Running 0 12h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main NodePort 10.1.96.0 <none> 9093:39093/TCP 12h
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 12h
service/grafana NodePort 10.1.165.84 <none> 3000:33000/TCP 12h
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 12h
service/node-exporter ClusterIP None <none> 9100/TCP 12h
service/prometheus-adapter ClusterIP 10.1.114.161 <none> 443/TCP 12h
service/prometheus-k8s NodePort 10.1.162.187 <none> 9090:39090/TCP 12h
service/prometheus-operated ClusterIP None <none> 9090/TCP 12h
service/prometheus-operator ClusterIP None <none> 8080/TCP 12h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 3 3 3 3 3 kubernetes.io/os=linux 12h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 12h
deployment.apps/kube-state-metrics 1/1 1 1 12h
deployment.apps/prometheus-adapter 1/1 1 1 12h
deployment.apps/prometheus-operator 1/1 1 1 12h
NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-58dc7468d7 1 1 1 12h
replicaset.apps/kube-state-metrics-78b46c84d8 1 1 1 12h
replicaset.apps/prometheus-adapter-5cd5798d96 1 1 1 12h
replicaset.apps/prometheus-operator-99dccdc56 1 1 1 12h
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 12h
statefulset.apps/prometheus-k8s 2/2 12h
地址:http://192.168.1.55:39090/。
查看Prometheus的targets頁面,可以看到所有的targets都可以正常監控。
查看Prometheus的alerts頁面,可以看到預置了多個告警規則。
地址:http://192.168.1.55:33000/。
Grafana缺省內置了多個dashboard。
以下爲和Node相關的Dashboard,可以看到自動添加了prometheus數據源,可以查看不同Node的監控數據。
以下爲和Pod相關的Dashboard,可以查看不同命名空間和Pod的監控數據。
地址:http://192.168.1.55:39093/。
我們以Nginx Ingress爲例,使用Prometheus Operator來監控Nginx Ingress。我們先通過Helm部署,選擇nginx/nginx-ingress。
helm repo add nginx https://helm.nginx.com/stable
helm search repo nginx
NAME CHART VERSION APP VERSION DESCRIPTION
bitnami/nginx 5.1.1 1.16.1 Chart for the nginx server
bitnami/nginx-ingress-controller 5.2.2 0.26.2 Chart for the nginx Ingress controller
nginx/nginx-ingress 0.4.3 1.6.3 NGINX Ingress Controller
stable/nginx-ingress 1.27.0 0.26.1 An nginx Ingress controller that uses ConfigMap...
stable/nginx-ldapauth-proxy 0.1.3 1.13.5 nginx proxy with ldapauth
stable/nginx-lego 0.3.1 Chart for nginx-ingress-controller and kube-lego
stable/gcloud-endpoints 0.1.2 1 DEPRECATED Develop, deploy, protect and monitor...
先渲染模板來分析一下,爲了集羣外訪問,“type: LoadBalancer"應該改爲"type: NodePort”,並將nodePort配置爲固定端口。Pod公開了prometheus端口"containerPort: 9113",我們下面先使用PodMonitor進行監控。
helm template gateway nginx/nginx-ingress
......
# Source: nginx-ingress/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
name: gateway-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: gateway-nginx-ingress
helm.sh/chart: nginx-ingress-0.4.3
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: gateway
spec:
externalTrafficPolicy: Local
type: LoadBalancer
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
- port: 443
targetPort: 443
protocol: TCP
name: https
selector:
app: gateway-nginx-ingress
---
# Source: nginx-ingress/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: gateway-nginx-ingress
helm.sh/chart: nginx-ingress-0.4.3
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: gateway
spec:
replicas: 1
selector:
matchLabels:
app: gateway-nginx-ingress
template:
metadata:
labels:
app: gateway-nginx-ingress
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
spec:
serviceAccountName: gateway-nginx-ingress
hostNetwork: false
containers:
- image: "nginx/nginx-ingress:1.6.3"
name: gateway-nginx-ingress
imagePullPolicy: "IfNotPresent"
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
- name: prometheus
containerPort: 9113
......
根據上面的分析,設置以下覆蓋參數來安裝nginx-ingress。
helm install gateway nginx/nginx-ingress \
--set controller.service.type=NodePort \
--set controller.service.httpPort.nodePort=30080 \
--set controller.service.httpsPort.nodePort=30443
查看Kubernetes資源。
NAME READY STATUS RESTARTS AGE
pod/gateway-nginx-ingress-55886df446-bwbts 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/gateway-nginx-ingress NodePort 10.1.10.126 <none> 80:30080/TCP,443:30443/TCP 12m
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 108d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/gateway-nginx-ingress 1/1 1 1 12m
NAME DESIRED CURRENT READY AGE
replicaset.apps/gateway-nginx-ingress-55886df446 1 1 1 12m
kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
gateway-nginx-ingress-55886df446-bwbts 1/1 Running 0 13m 10.244.2.46 k8s-node2 <none> <none>
可以直接從Pod地址訪問metrics。
curl http://10.244.2.46:9113/metrics
# HELP nginx_ingress_controller_ingress_resources_total Number of handled ingress resources
# TYPE nginx_ingress_controller_ingress_resources_total gauge
nginx_ingress_controller_ingress_resources_total{type="master"} 0
nginx_ingress_controller_ingress_resources_total{type="minion"} 0
nginx_ingress_controller_ingress_resources_total{type="regular"} 0
# HELP nginx_ingress_controller_nginx_last_reload_milliseconds Duration in milliseconds of the last NGINX reload
# TYPE nginx_ingress_controller_nginx_last_reload_milliseconds gauge
nginx_ingress_controller_nginx_last_reload_milliseconds 195
# HELP nginx_ingress_controller_nginx_last_reload_status Status of the last NGINX reload
# TYPE nginx_ingress_controller_nginx_last_reload_status gauge
nginx_ingress_controller_nginx_last_reload_status 1
# HELP nginx_ingress_controller_nginx_reload_errors_total Number of unsuccessful NGINX reloads
# TYPE nginx_ingress_controller_nginx_reload_errors_total counter
nginx_ingress_controller_nginx_reload_errors_total 0
# HELP nginx_ingress_controller_nginx_reloads_total Number of successful NGINX reloads
# TYPE nginx_ingress_controller_nginx_reloads_total counter
nginx_ingress_controller_nginx_reloads_total 2
# HELP nginx_ingress_controller_virtualserver_resources_total Number of handled VirtualServer resources
# TYPE nginx_ingress_controller_virtualserver_resources_total gauge
nginx_ingress_controller_virtualserver_resources_total 0
# HELP nginx_ingress_controller_virtualserverroute_resources_total Number of handled VirtualServerRoute resources
# TYPE nginx_ingress_controller_virtualserverroute_resources_total gauge
nginx_ingress_controller_virtualserverroute_resources_total 0
# HELP nginx_ingress_nginx_connections_accepted Accepted client connections
# TYPE nginx_ingress_nginx_connections_accepted counter
nginx_ingress_nginx_connections_accepted 6
......
我們創建PodMonitor,其中幾個需要注意的關鍵點。
- PodMonitor的name最終會反應到Prometheus的配置中,作爲job_name。
- PodMonitor的命名空間必須和Prometheus所在的命名空間相同,此處爲monitoring。
- podMetricsEndpoints.interval爲抓取間隔。
- podMetricsEndpoints.port需要和Pod/Deployment中的拉取metrics的ports.name對應,此處爲prometheus。
- namespaceSelector.matchNames需要和被監控的Pod所在的命名空間相同,此處爲default。
- selector.matchLabels的標籤必須和被監控的Pod中能唯一標明身份的標籤對應。
創建Pod對應的PodMonitor。
vi prometheus-podMonitorNginxIngress.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
labels:
app: nginx-ingress
name: nginx-ingress
namespace: monitoring
spec:
podMetricsEndpoints:
- interval: 15s
path: /metrics
port: prometheus
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: gateway-nginx-ingress
kubectl apply -f prometheus-podMonitorNginxIngress.yaml
此PodMonitor其實就是一個配置文件,Prometheus Operator會根據PodMonitor進行Prometheus的相關配置,自動對該Pod進行監控。到Prometheus查看監控目標。注意label中有pod="gateway-nginx-ingress-55886df446-bwbts"
,標明監控Pod。
Prometheus已經出現Nginx Ingress相關的配置job_name: monitoring/nginx-ingress/0
。
Prometheus監控指標irate(nginx_ingress_nginx_http_requests_total[1m])
。
我們仍舊以Nginx Ingress爲例,我們先通過Helm部署。
先重新渲染模板來分析一下,Pod公開了prometheus端口"containerPort: 9113",但Service沒有公開該端口。而ServiceMonitor恰恰是是通過Service的prometheus端口拉取metrics數據的,所以我們通過controller.service.customPorts來向Service添加該端口。
helm template gateway nginx/nginx-ingress \
--set controller.service.type=NodePort \
--set controller.service.httpPort.nodePort=30080 \
--set controller.service.httpsPort.nodePort=30443
......
# Source: nginx-ingress/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
name: gateway-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: gateway-nginx-ingress
helm.sh/chart: nginx-ingress-0.4.3
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: gateway
spec:
externalTrafficPolicy: Local
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
nodePort: 30080
- port: 443
targetPort: 443
protocol: TCP
name: https
nodePort: 30443
selector:
app: gateway-nginx-ingress
---
# Source: nginx-ingress/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: gateway-nginx-ingress
helm.sh/chart: nginx-ingress-0.4.3
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: gateway
spec:
replicas: 1
selector:
matchLabels:
app: gateway-nginx-ingress
template:
metadata:
labels:
app: gateway-nginx-ingress
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
spec:
serviceAccountName: gateway-nginx-ingress
hostNetwork: false
containers:
- image: "nginx/nginx-ingress:1.6.3"
name: gateway-nginx-ingress
imagePullPolicy: "IfNotPresent"
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
- name: prometheus
containerPort: 9113
......
根據上面的分析,設置以下覆蓋參數來安裝nginx-ingress。
helm install gateway nginx/nginx-ingress \
--set controller.service.type=NodePort \
--set controller.service.httpPort.nodePort=30080 \
--set controller.service.httpsPort.nodePort=30443 \
--set controller.service.customPorts[0].port=9113 \
--set controller.service.customPorts[0].targetPort=9113 \
--set controller.service.customPorts[0].protocol=TCP \
--set controller.service.customPorts[0].name=prometheus \
--set controller.service.customPorts[0].nodePort=39113
查看Kubernetes資源,可以看出service/gateway-nginx-ingress暴露了9113端口。
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/gateway-nginx-ingress-55886df446-mwjs8 1/1 Running 0 10s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/gateway-nginx-ingress NodePort 10.1.98.109 <none> 9113:39113/TCP,80:30080/TCP,443:30443/TCP 10s
service/kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 107d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/gateway-nginx-ingress 1/1 1 1 10s
NAME DESIRED CURRENT READY AGE
replicaset.apps/gateway-nginx-ingress-55886df446 1 1 1 10s
可以訪問metrics。
curl http://192.168.1.55:39113/metrics
# HELP nginx_ingress_controller_ingress_resources_total Number of handled ingress resources
# TYPE nginx_ingress_controller_ingress_resources_total gauge
nginx_ingress_controller_ingress_resources_total{type="master"} 0
nginx_ingress_controller_ingress_resources_total{type="minion"} 0
nginx_ingress_controller_ingress_resources_total{type="regular"} 0
# HELP nginx_ingress_controller_nginx_last_reload_milliseconds Duration in milliseconds of the last NGINX reload
# TYPE nginx_ingress_controller_nginx_last_reload_milliseconds gauge
nginx_ingress_controller_nginx_last_reload_milliseconds 152
# HELP nginx_ingress_controller_nginx_last_reload_status Status of the last NGINX reload
# TYPE nginx_ingress_controller_nginx_last_reload_status gauge
nginx_ingress_controller_nginx_last_reload_status 1
# HELP nginx_ingress_controller_nginx_reload_errors_total Number of unsuccessful NGINX reloads
# TYPE nginx_ingress_controller_nginx_reload_errors_total counter
nginx_ingress_controller_nginx_reload_errors_total 0
# HELP nginx_ingress_controller_nginx_reloads_total Number of successful NGINX reloads
# TYPE nginx_ingress_controller_nginx_reloads_total counter
nginx_ingress_controller_nginx_reloads_total 2
# HELP nginx_ingress_controller_virtualserver_resources_total Number of handled VirtualServer resources
# TYPE nginx_ingress_controller_virtualserver_resources_total gauge
nginx_ingress_controller_virtualserver_resources_total 0
# HELP nginx_ingress_controller_virtualserverroute_resources_total Number of handled VirtualServerRoute resources
# TYPE nginx_ingress_controller_virtualserverroute_resources_total gauge
nginx_ingress_controller_virtualserverroute_resources_total 0
# HELP nginx_ingress_nginx_connections_accepted Accepted client connections
# TYPE nginx_ingress_nginx_connections_accepted counter
nginx_ingress_nginx_connections_accepted 5
......
查看Prometheus自定義資源,Prometheus根據serviceMonitorSelector關聯ServiceMonitor,如果serviceMonitorSelector沒有定義,那意味着會關聯所有的ServiceMonitor。
kubectl get prometheuses.monitoring.coreos.com/k8s -nmonitoring -oyaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Prometheus","metadata":{"annotations":{},"labels":{"prometheus":"k8s"},"name":"k8s","namespace":"monitoring"},"spec":{"alerting":{"alertmanagers":[{"name":"alertmanager-main","namespace":"monitoring","port":"web"}]},"baseImage":"quay.io/prometheus/prometheus","nodeSelector":{"kubernetes.io/os":"linux"},"podMonitorSelector":{},"replicas":2,"resources":{"requests":{"memory":"400Mi"}},"ruleSelector":{"matchLabels":{"prometheus":"k8s","role":"alert-rules"}},"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"prometheus-k8s","serviceMonitorNamespaceSelector":{},"serviceMonitorSelector":{},"version":"v2.11.0"}}
creationTimestamp: "2020-03-26T13:09:35Z"
generation: 1
labels:
prometheus: k8s
name: k8s
namespace: monitoring
resourceVersion: "15195"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
uid: 802c8f09-95e8-43e8-a2ea-131877fc6b4e
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
創建ServiceMonitor。其中幾個需要注意的關鍵點。
- ServiceMonitor的name最終會反應到Prometheus的配置中,作爲job_name。
- 如果Prometheus自定義資源中定義了serviceMonitorSelector,ServiceMonitor就需要有對應的標籤,由此上面的Prometheus自定義資源沒有定義serviceMonitorSelector,所以此處沒有要求。
- ServiceMonitor的命名空間必須和Prometheus所在的命名空間相同,此處爲monitoring。
- endpoints.interval爲抓取間隔。
- endpoints.port需要和Service中的拉取metrics的ports.name對應,此處爲prometheus。
- namespaceSelector.matchNames需要和被監控的Service所在的命名空間相同,此處爲default。
- selector.matchLabels的標籤必須和被監控的Service中能唯一標明身份的標籤對應。
創建nginx-ingress服務對應的ServiceMonitor。
vi prometheus-serviceMonitorNginxIngress.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: nginx-ingress
name: nginx-ingress
namespace: monitoring
spec:
endpoints:
- interval: 15s
port: prometheus
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: gateway-nginx-ingress
kubectl apply -f prometheus-serviceMonitorNginxIngress.yaml
此ServiceMonitor其實就是一個配置文件,Prometheus Operator會根據ServiceMonitor進行Prometheus的相關配置,自動對該服務進行監控。到Prometheus查看監控目標。注意此處有service="gateway-nginx-ingress"
標明是對服務的監控。
Prometheus已經出現Nginx Ingress相關的配置job_name: monitoring/nginx-ingress/0
。
Prometheus監控指標irate(nginx_ingress_nginx_http_requests_total[1m])
。