1. 問題:
兩臺節點都是Ready狀態,node上 docker / kubelet / flanneld / kube-proxy 服務都運行正常,爲什麼pod只能被調度到一臺上?
2. 現狀:
master]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
192.168.89.133 Ready <none> 194d v1.16.2 192.168.89.133 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://1.13.1
192.168.89.134 Ready <none> 193d v1.16.2 192.168.89.134 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://1.13.1
兩臺機器都是Ready狀態
master]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-dns-6685cc54bd-lb5v4 3/3 Running 0 46m 10.1.84.2 192.168.89.134 <none> <none>
但是所有的pod只調度到節點2
3. 根據現有問題思考原因
思考一:很久以前配置了節點1不可用
思考二:節點1更改過服務權限配置
思考三:master做過限制,禁止調度到節點1
4. 排錯:
先看正常的節點2擁有哪些端口在節點1上沒有的
節點2]# ss -anptu | less
udp UNCONN 0 0 192.168.122.1:53 *:* users:(("dnsmasq",pid=2083,fd=5))
udp UNCONN 0 0 *%virbr0:67 *:* users:(("dnsmasq",pid=2083,fd=3))
tcp LISTEN 0 5 192.168.122.1:53 *:* users:(("dnsmasq",pid=2083,fd=6))
發現 節點1上沒有 dnsmasq,也就是53端口,也沒有安裝dnsmasq
節點1]# yum -y install dnsmasq
節點1]# dnsmasq
節點1]# ss -anptu | grep dnsmasq
udp UNCONN 0 0 *:53 *:* users:(("dnsmasq",pid=92515,fd=4))
udp UNCONN 0 0 [::]:53 [::]:* users:(("dnsmasq",pid=92515,fd=6))
tcp LISTEN 0 5 *:53 *:* users:(("dnsmasq",pid=92515,fd=5))
tcp LISTEN 0 5 [::]:53 [::]:* users:(("dnsmasq",pid=92515,fd=7))
master]# kubectl delete -f kubedns-controller.yaml
deployment.apps "kube-dns" deleted
master]# kubectl create -f kubedns-controller.yaml
deployment.apps/kube-dns created
master]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-dns-6685cc54bd-bqjsn 3/3 Running 0 4m22s 10.1.5.2 192.168.89.133 <none> <none>
發現可以重新調度到節點1上了,但是不確定這時候節點2是否也是正常的,現在多建幾個pods驗證下
重新安裝dashboard
master]# kubectl create -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
master]# kubectl get pods -n kubernetes-dashboard -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dashboard-metrics-scraper-5f4dc864c4-rt45p 1/1 Running 0 13s 10.1.84.2 192.168.89.134 <none> <none>
kubernetes-dashboard-687bd5c7d7-zrppg 1/1 Running 0 14s 10.1.5.3 192.168.89.133 <none> <none>
5. 總結
將錯誤的和正確的比較,不一定是核心配置,也有可能是周邊應用引起的,我這節點1之前一直無法調度,沒有注意,今天注意了下,原來是dns問題,沒有這個dnsmasq做支撐,master可能就無法調度pod到這臺機器了。