我的k8s版本是1.17.0,本地k8s部署metrics-server無法度量到數據,HPA顯示unknow。
通過命令 kubectl logs metrics-server-dc6fb55f4-z88lm -n kube-system 可以看到類似如下錯誤
E0225 02:30:52.433523 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-n-2: unable to fetch metrics from Kubelet k8s-n-2 (k8s-n-2): Get https://k8s-n-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-n-2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-m: unable to fetch metrics from Kubelet k8s-m (k8s-m): Get https://k8s-m:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-m on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-n-1: unable to fetch metrics from Kubelet k8s-n-1 (k8s-n-1): Get https://k8s-n-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-n-1 on 10.96.0.10:53: no such host]
這個問題是由於主機名k8s-n-2沒有作DNS解析,所以croedns無法找到主機。 各種雲上不會有這個問題,因爲一般雲上主機在雲端內部DNS服務器上都是自動添加了DNS記錄的,使用雲端自有DNS就能解釋了主機的IP,這個問題一般出現在本地部署。
解決辦法有二種。
第一種,安裝類似DNSMASQ的服務器,自己解釋主機IP,在Master的主機上使用這個DNS(coredns自動繼承使用Master主機的DNS配置),這種應該是比較正規的做法。
第二種,修改metrics-server的Deployment,增加以下命令段
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
經過以上修改,metrics-server就會改爲以IP形式來請求metrics數據,kubelet-insecure-tls參數是因爲改爲IP後,原來基於主機名的證書就不能用了(會提示x.509證書錯誤),只能使用非安全連接。
完整metrics-server-0.3.6\deploy\1.8+\metrics-server-deployment.yaml修改如下
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
最後重新Apply一下這個YAML即可。