有一個不錯的排查流程,在這裏。
https://learnk8s.io/troubleshooting-deployments
還有PDF版本。
https://learnk8s.io/a/troubleshooting-kubernetes.pdf
這裏將每一個步驟截子圖來分析,完整排查圖可以跳到末尾來看。
首先,考慮是否PENDING
主要的命令是
kubectl get pods
kubectl describe pod <pod-name> : cluster full, ResourceQuota, PENDDING PersisitentVolumeClaim
kubectl get pods -o wide : assign to node, Scheduler issue, Kubelet issue
接下來看Pods未RUNNING的問題
kubectl logs <pod-name> : application issue
kubectl logs <pod-name> --previous : pod died too quickly
kubectl describe pod <pod-name> :
ImagePullBackOff, image name incorrect, image tag invalid, private registry, CRI or kubelet issue
CrashLoopBackOff, app crashing, Dockerfile CMD instruction, liveness problem restarting frequently RUNNING CrashLoopBackOff, other unknown state
再後面是看Pods未READY的問題
kubectl describe pod <pod-name> : Readiness probe failing, unknown state
然後是看端口映射
kubectl port-forward <pod-name> 8080:<pod-port> : not listen to 0.0.0.0 or port not exposed
在Pod正常運行的情況下
看service的問題
kubectl describe service <service-name> : cannot see endpoints list, service selector not matching pod label, controller manager pod without IP addr assigned, Kubelet issue
kubectl port-forward service/<service-name> 8080:<service-port> : service targetPort not match pod containerPort, Kube Proxy issue
看ingress的問題
kubectl describe ingress <ingress-name> : cannot see Backends list, serviceName and servicePort not match service, ingress controller issue
kubectl port-forward <ingress-pod-name> 8080:<ingress-port> : ingress controller issue
about how cluster expose to internet infrastructure