kubernetes問題排查

1. 查看系統Event事件

kubectl describe pod <PodName> --namespace=<NAMESPACE>

該命令可以顯示Pod創建時的配置定義、狀態等信息和最近的Event事件,事件信息可用於排錯。例如當Pod狀態爲Pending,可通過查看Event事件確認原因,一般原因有幾種:

  • 沒有可用的Node可調度
  • 開啓了資源配額管理並且當前Pod的目標節點上恰好沒有可用的資源
  • 正在下載鏡像(鏡像拉取耗時太久)或鏡像下載失敗。

kubectl describe還可以查看其它k8s對象:NODE,RC,Service,Namespace,Secrets。

1.1. Pod

kubectl describe pod <PodName> --namespace=<NAMESPACE>

以下是容器的啓動命令非阻塞式導致容器掛掉,被k8s頻繁重啓所產生的事件。

kubectl describe pod <PodName> --namespace=<NAMESPACE>  

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath       Reason      Message
  ───────── ────────    ─────   ────            ─────────────       ──────      ───────
  7m        7m      1   {scheduler }                    Scheduled   Successfully assigned yangsc-1-0-0-index0 to 10.8.216.19
  7m        7m      1   {kubelet 10.8.216.19}   containers{infra}   Pulled      Container image "gcr.io/kube-system/pause:0.8.0" already present on machine
  7m        7m      1   {kubelet 10.8.216.19}   containers{infra}   Created     Created with docker id 84f133c324d0
  7m        7m      1   {kubelet 10.8.216.19}   containers{infra}   Started     Started with docker id 84f133c324d0
  7m        7m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 3f9f82abb145
  7m        7m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 3f9f82abb145
  7m        7m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id fb112e4002f4
  7m        7m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id fb112e4002f4
  6m        6m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 613b119d4474
  6m        6m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 613b119d4474
  6m        6m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 25cb68d1fd3d
  6m        6m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 25cb68d1fd3d
  5m        5m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 7d9ee8610b28
  5m        5m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 7d9ee8610b28
  3m        3m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 88b9e8d582dd
  3m        3m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 88b9e8d582dd
  7m        1m      7   {kubelet 10.8.216.19}   containers{yangsc0} Pulling     Pulling image "gcr.io/test/tcp-hello:1.0.0"
  1m        1m      1   {kubelet 10.8.216.19}   containers{yangsc0} Started     Started with docker id 089abff050e7
  1m        1m      1   {kubelet 10.8.216.19}   containers{yangsc0} Created     Created with docker id 089abff050e7
  7m        1m      7   {kubelet 10.8.216.19}   containers{yangsc0} Pulled      Successfully pulled image "gcr.io/test/tcp-hello:1.0.0"
  6m        7s      34  {kubelet 10.8.216.19}   containers{yangsc0} Backoff     Back-off restarting failed docker container

1.2. NODE

kubectl describe node 10.8.216.20
[root@FC-43745A-10 ~]# kubectl describe node 10.8.216.20  
Name:           10.8.216.20  
Labels:         kubernetes.io/hostname=10.8.216.20,namespace/bcs-cc=true,namespace/myview=true  
CreationTimestamp:  Mon, 17 Apr 2017 11:32:52 +0800  
Phase:            
Conditions:  
  Type      Status  LastHeartbeatTime           LastTransitionTime          Reason              Message  
  ────      ──────  ─────────────────           ──────────────────          ──────              ───────  
  Ready     True    Fri, 18 Aug 2017 09:38:33 +0800     Tue, 02 May 2017 17:40:58 +0800     KubeletReady            kubelet is posting ready status  
  OutOfDisk     False   Fri, 18 Aug 2017 09:38:33 +0800     Mon, 17 Apr 2017 11:31:27 +0800     KubeletHasSufficientDisk    kubelet has sufficient disk space available  
Addresses:  10.8.216.20,10.8.216.20  
Capacity:  
 cpu:       32  
 memory:    67323039744  
 pods:      40  
System Info:  
 Machine ID:            723bafc7f6764022972b3eae1ce6b198  
 System UUID:           4C4C4544-0042-4210-8044-C3C04F595631  
 Boot ID:           da01f2e3-987a-425a-9ca7-1caaec35d1e5  
 Kernel Version:        3.10.0-327.28.3.el7.x86_64  
 OS Image:          CentOS Linux 7 (Core)  
 Container Runtime Version: docker://1.13.1  
 Kubelet Version:       v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty  
 Kube-Proxy Version:        v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty  
ExternalID:         10.8.216.20  
Non-terminated Pods:        (6 in total)  
  Namespace         Name                    CPU Requests    CPU Limits  Memory Requests Memory Limits  
  ─────────         ────                    ────────────    ──────────  ─────────────── ─────────────  
  bcs-cc            bcs-cc-api-0-0-1364-index0      1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
  bcs-cc            bcs-cc-api-0-0-1444-index0      1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
  fw                fw-demo2-0-0-1519-index0        1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
  myview            myview-api-0-0-1362-index0      1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
  myview            myview-api-0-0-1442-index0      1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
  qa-ts-dna         ts-dna-console3-0-0-1434-index0     1 (3%)      1 (3%)      4294967296 (6%) 4294967296 (6%)  
Allocated resources:  
  (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)  
  CPU Requests  CPU Limits  Memory Requests     Memory Limits  
  ────────────  ──────────  ───────────────     ─────────────  
  6 (18%)   6 (18%)     25769803776 (38%)   25769803776 (38%)  
No events.

1.3. RC

kubectl describe rc mytest-1-0-0 --namespace=test
[root@FC-43745A-10 ~]# kubectl describe rc mytest-1-0-0 --namespace=test  
Name:       mytest-1-0-0  
Namespace:  test  
Image(s):   gcr.io/test/mywebcalculator:1.0.1  
Selector:   app=mytest,appVersion=1.0.0  
Labels:     app=mytest,appVersion=1.0.0,env=ts,zone=inner  
Replicas:   1 current / 1 desired  
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed  
No volumes.  
Events:  
  FirstSeen LastSeen    Count   From                SubobjectPath   Reason          Message  
  ───────── ────────    ─────   ────                ─────────────   ──────          ───────  
  20h       19h     9   {replication-controller }           FailedCreate        Error creating: Pod "mytest-1-0-0-index0" is forbidden: limited to 10 pods  
  20h       17h     7   {replication-controller }           FailedCreate        Error creating: pods "mytest-1-0-0-index0" already exists  
  20h       17h     4   {replication-controller }           SuccessfulCreate    Created pod: mytest-1-0-0-index0

1.4. NAMESPACE

kubectl describe namespace test
[root@FC-43745A-10 ~]# kubectl describe namespace test  
Name:   test  
Labels: <none>  
Status: Active  

Resource Quotas  
 Resource       Used        Hard  
 ---            ---     ---  
 cpu            5       20  
 memory         1342177280  53687091200  
 persistentvolumeclaims 0       10  
 pods           4       10  
 replicationcontrollers 8       20  
 resourcequotas     1       1  
 secrets        3       10  
 services       8       20  

No resource limits.

1.5. Service

kubectl describe service xxx-containers-1-1-0 --namespace=test
[root@FC-43745A-10 ~]# kubectl describe service xxx-containers-1-1-0 --namespace=test  
Name:           xxx-containers-1-1-0  
Namespace:      test  
Labels:         app=xxx-containers,appVersion=1.1.0,env=ts,zone=inner  
Selector:       app=xxx-containers,appVersion=1.1.0  
Type:           ClusterIP  
IP:         10.254.46.42  
Port:           port-dna-tcp-35913  35913/TCP  
Endpoints:      10.0.92.17:35913  
Port:           port-l7-tcp-8080    8080/TCP  
Endpoints:      10.0.92.17:8080  
Session Affinity:   None  
No events.

2. 查看容器日誌

1、查看指定pod的日誌

kubectl logs <pod_name>

kubectl logs -f <pod_name> #類似tail -f的方式查看

2、查看上一個pod的日誌

kubectl logs -p <pod_name>

3、查看指定pod中指定容器的日誌

kubectl logs <pod_name> -c <container_name>

4、kubectl logs --help

[root@node5 ~]# kubectl logs --help  
Print the logs for a container in a pod. If the pod has only one container, the container name is optional.  
Usage:  
  kubectl logs [-f] [-p] POD [-c CONTAINER] [flags]  
Aliases:  
  logs, log  

Examples:  
# Return snapshot logs from pod nginx with only one container  
$ kubectl logs nginx  
# Return snapshot of previous terminated ruby container logs from pod web-1  
$ kubectl logs -p -c ruby web-1  
# Begin streaming the logs of the ruby container in pod web-1  
$ kubectl logs -f -c ruby web-1  
# Display only the most recent 20 lines of output in pod nginx  
$ kubectl logs --tail=20 nginx  
# Show all logs from pod nginx written in the last hour  
$ kubectl logs --since=1h nginx

3. 查看k8s服務日誌

3.1. journalctl

在Linux系統上systemd系統來管理kubernetes服務,並且journal系統會接管服務程序的輸出日誌,可以通過systemctl status 或journalctl -u -f來查看kubernetes服務的日誌。

其中kubernetes組件包括:

k8s組件 涉及日誌內容 備註
kube-apiserver    
kube-controller-manager Pod擴容相關或RC相關  
kube-scheduler Pod擴容相關或RC相關  
kubelet Pod生命週期相關:創建、停止等  
etcd  

3.2. 日誌文件

也可以通過指定日誌存放目錄來保存和查看日誌

  • --logtostderr=false:不輸出到stderr
  • --log-dir=/var/log/kubernetes:日誌的存放目錄
  • --alsologtostderr=false:設置爲true表示日誌輸出到文件也輸出到stderr
  • --v=0:glog的日誌級別
  • --vmodule=gfs=2,test=4:glog基於模塊的詳細日誌級別

4. 常見問題

4.1. Pod狀態一直爲Pending

kubectl describe <pod_name> --namespace=<NAMESPACE>

查看該POD的事件。

  • 正在下載鏡像但拉取不下來(鏡像拉取耗時太久)[一般都是該原因]
  • 沒有可用的Node可調度
  • 開啓了資源配額管理並且當前Pod的目標節點上恰好沒有可用的資源

解決方法:

  1. 查看該POD所在宿主機與鏡像倉庫之間的網絡是否有問題,可以手動拉取鏡像
  2. 刪除POD實例,讓POD調度到別的宿主機上

4.2. Pod創建後不斷重啓

kubectl get pods中Pod狀態一會running,一會不是,且RESTARTS次數不斷增加。

一般原因爲容器啓動命令不是阻塞式命令,導致容器運行後馬上退出。

非阻塞式命令:

  • 本身CMD指定的命令就是非阻塞式命令
  • 將服務啓動方式設置爲後臺運行

解決方法:

1、將命令改爲阻塞式命令(前臺運行),例如:zkServer.sh start-foreground

2、java運行程序的啓動腳本將 nohup xxx &的nobup和&去掉,例如:

nohup JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main &

改爲:

JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main

文章參考《Kubernetes權威指南》

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章