Kubernetes排查流程

有一個不錯的排查流程,在這裏。
https://learnk8s.io/troubleshooting-deployments

還有PDF版本。
https://learnk8s.io/a/troubleshooting-kubernetes.pdf

這裏將每一個步驟截子圖來分析,完整排查圖可以跳到末尾來看。

首先,考慮是否PENDING

主要的命令是

kubectl get pods
kubectl describe pod <pod-name> : cluster full, ResourceQuota, PENDDING PersisitentVolumeClaim
kubectl get pods -o wide : assign to node, Scheduler issue, Kubelet issue

PENDING

接下來看Pods未RUNNING的問題

kubectl logs <pod-name> : application issue
kubectl logs <pod-name> --previous : pod died too quickly
kubectl describe pod <pod-name> : 
  ImagePullBackOff, image name incorrect, image tag invalid, private registry, CRI or kubelet issue
  CrashLoopBackOff, app crashing, Dockerfile CMD instruction, liveness problem restarting frequently RUNNING CrashLoopBackOff, other unknown state

未RUNNING

再後面是看Pods未READY的問題

kubectl describe pod <pod-name> : Readiness probe failing, unknown state

未READY

然後是看端口映射

kubectl port-forward <pod-name> 8080:<pod-port> : not listen to 0.0.0.0 or port not exposed

端口映射問題
在Pod正常運行的情況下

看service的問題

kubectl describe service <service-name> : cannot see endpoints list, service selector not matching pod label, controller manager pod without IP addr assigned, Kubelet issue
kubectl port-forward service/<service-name> 8080:<service-port> : service targetPort not match pod containerPort, Kube Proxy issue

Service的問題

看ingress的問題

kubectl describe ingress <ingress-name> : cannot see Backends list, serviceName and servicePort not match service, ingress controller issue
kubectl port-forward <ingress-pod-name> 8080:<ingress-port> : ingress controller issue
  about how cluster expose to internet infrastructure

Ingress的問題

完整排查圖

Kubernetes排查流程Part1

Kubernetes排查流程2

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章