k8s Trouble Shooting 故障排除

    本文要講的是k8s的故障排除,比較淺,最近剛入門。主要涵蓋的內容是查看k8s對象的當前運行時信息;對於服務、容器的問題是如何診斷的;對於某些複雜的問題例如pod調度問題是如何排查的。

1、查看系統的Event事件

    在對象資源(pod,service,RC,node,namespace,deployment等)運行有問題時,例如pod創建後沒有成功運行,都應該查看k8s對象的當前運行時信息,特別是與對象關聯的Event事件。這些事件記錄了相關主題、發生時段、最近發生時間、發生次數和時間原因等。

    k8s提供一下命令來查看對象運行狀態:

kubectl describe pod xxxx
kubectl describe node xxxx

結果如下:
 

[root@centos ~]# kubectl  get pod
NAME                    READY   STATUS    RESTARTS   AGE
curl-5f8bff6547-rb4qk   1/1     Running   2          3d14h
redis-master-7j8cm      1/1     Running   2          3d14h
webapp-j7gd2            1/1     Running   3          3d21h
webapp-kzrn7            1/1     Running   3          3d14h
[root@centos ~]# kubectl describe pod webapp-j7gd2 
Name:               webapp-j7gd2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               node3/192.168.195.138
Start Time:         Mon, 08 Apr 2019 13:19:25 +0800
Labels:             app=webapp
Annotations:        <none>
Status:             Running
IP:                 10.244.1.35
Controlled By:      ReplicationController/webapp
Containers:
  webapp:
    Container ID:   docker://e4dd5ec51e4d05456bd1605459a252085ad092c6be26e2becd5301114a470a33
    Image:          tomcat:9-jre8-alpine
    Image ID:       docker-pullable://tomcat@sha256:67fc2a0a54f9dfa7abda85a2900d721a55115dcae8ca7da560e65d15ca4c8aa7
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 11 Apr 2019 09:26:42 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Mon, 08 Apr 2019 21:52:27 +0800
      Finished:     Thu, 11 Apr 2019 09:25:55 +0800
    Ready:          True
    Restart Count:  3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-nx72w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-nx72w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-nx72w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

最後一行的event信息比較難重要,我這個pod是沒有問題的,所以沒啥信息,如果你的pod有一場的話,這邊是會有錯誤信息的。然後錯誤信息是英文的,你一看就知道是什麼問題。一般是鏡像拉不到啥的,沒有可用的node等等。如果你的pod是在某個namespace下的,不是default命名空間下的,那就需要用一下命令來指定命名空間:

kubectl describe pod xxx -n 你的命名空間

2、查看容器的日誌

  在需要排查容器內部應用程序生成的日誌時,可以使用kubectl logs <pod-name>命令,例如:

[root@centos ~]# kubectl  get pod
NAME                    READY   STATUS    RESTARTS   AGE
curl-5f8bff6547-rb4qk   1/1     Running   2          3d14h
redis-master-7j8cm      1/1     Running   2          3d14h
webapp-j7gd2            1/1     Running   3          3d21h
webapp-kzrn7            1/1     Running   3          3d14h
[root@centos ~]# kubectl logs webapp-j7gd2 
11-Apr-2019 01:26:45.108 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/9.0.17
11-Apr-2019 01:26:45.145 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Mar 13 2019 15:55:27 UTC
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 9.0.17.0
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            3.10.0-957.el7.x86_64
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8-openjdk/jre
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_201-b08
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
11-Apr-2019 01:26:45.149 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
11-Apr-2019 01:26:45.149 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/usr/local/tomcat
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/usr/local/tomcat/temp
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded APR based Apache Tomcat Native library [1.2.21] using APR version [1.6.5].
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
11-Apr-2019 01:26:45.160 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.1b  26 Feb 2019]
11-Apr-2019 01:26:45.606 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:45.678 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:45.689 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [2,071] milliseconds
11-Apr-2019 01:26:45.755 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
11-Apr-2019 01:26:45.755 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.17]
11-Apr-2019 01:26:45.777 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/ROOT]
11-Apr-2019 01:26:46.985 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/ROOT] has finished in [1,202] ms
11-Apr-2019 01:26:46.986 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/docs]
11-Apr-2019 01:26:47.071 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/docs] has finished in [86] ms
11-Apr-2019 01:26:47.080 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/examples]
11-Apr-2019 01:26:48.100 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/examples] has finished in [1,020] ms
11-Apr-2019 01:26:48.104 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/host-manager]
11-Apr-2019 01:26:48.169 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/host-manager] has finished in [65] ms
11-Apr-2019 01:26:48.169 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
11-Apr-2019 01:26:48.227 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [58] ms
11-Apr-2019 01:26:48.235 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:48.302 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:48.323 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [2,633] milliseconds

    如果在一個pod中包含多個容器,則需要通過-c參數來指定容器的名稱來進行查看,例如:

kubectl logs <pod_name> -c <container_name>

當然也可以直接直用docker logs <container_id>

[root@node2 ~]# docker ps | grep web
6041a63c30ea        6097ab3c4283           "catalina.sh run"        25 hours ago        Up 25 hours                             k8s_webapp_webapp-kzrn7_default_7c476613-59f4-11e9-9a41-000c29f1f0e4_3
974390ced06b        k8s.gcr.io/pause:3.1   "/pause"                 25 hours ago        Up 25 hours                             k8s_POD_webapp-kzrn7_default_7c476613-59f4-11e9-9a41-000c29f1f0e4_7
[root@node2 ~]# docker logs 6041a63c30ea
11-Apr-2019 01:26:33.432 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/9.0.17
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Mar 13 2019 15:55:27 UTC
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 9.0.17.0
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            3.10.0-957.el7.x86_64
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8-openjdk/jre
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_201-b08
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/usr/local/tomcat
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/usr/local/tomcat/temp
11-Apr-2019 01:26:33.531 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded APR based Apache Tomcat Native library [1.2.21] using APR version [1.6.5].
11-Apr-2019 01:26:33.539 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
11-Apr-2019 01:26:33.540 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
11-Apr-2019 01:26:33.565 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.1b  26 Feb 2019]
11-Apr-2019 01:26:34.291 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:34.374 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:34.378 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [3,215] milliseconds
11-Apr-2019 01:26:34.467 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
11-Apr-2019 01:26:34.468 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.17]
11-Apr-2019 01:26:34.507 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/ROOT]
11-Apr-2019 01:26:36.293 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/ROOT] has finished in [1,786] ms
11-Apr-2019 01:26:36.294 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/docs]
11-Apr-2019 01:26:36.368 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/docs] has finished in [73] ms
11-Apr-2019 01:26:36.377 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/examples]
11-Apr-2019 01:26:37.797 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/examples] has finished in [1,420] ms
11-Apr-2019 01:26:37.802 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/host-manager]
11-Apr-2019 01:26:38.031 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/host-manager] has finished in [228] ms
11-Apr-2019 01:26:38.032 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
11-Apr-2019 01:26:38.161 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [128] ms
11-Apr-2019 01:26:38.183 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:38.244 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:38.290 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [3,911] milliseconds

3、查看k8s的服務日誌

如果在linux系統上進行安裝,並且是使用systemd系統來管理k8s服務,那麼systemd的journal系統會接管服務程序的輸出日誌。可以使用systemd status 或者systemctl status或者journalctl查看系統服務日誌:

[root@node2 ~]# systemctl status kubelet.service 
Display all 502 possibilities? (y or n)
[root@node2 ~]# systemctl status kubelet.service 
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Thu 2019-04-11 09:25:36 CST; 1 day 1h ago
     Docs: https://kubernetes.io/docs/
 Main PID: 7793 (kubelet)
    Tasks: 19
   Memory: 112.4M
   CGroup: /system.slice/kubelet.service
           └─7793 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/v...

Apr 12 09:56:44 node2 kubelet[7793]: W0412 09:56:44.886746    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (562273)
Apr 12 09:57:46 node2 kubelet[7793]: W0412 09:57:46.933029    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (562359)
Apr 12 10:04:45 node2 kubelet[7793]: W0412 10:04:45.828641    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (562964)
Apr 12 10:11:04 node2 kubelet[7793]: W0412 10:11:04.635497    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (563510)
Apr 12 10:12:23 node2 kubelet[7793]: W0412 10:12:23.593624    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (563619)
Apr 12 10:24:09 node2 kubelet[7793]: W0412 10:24:09.875061    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (564637)
Apr 12 10:26:55 node2 kubelet[7793]: W0412 10:26:55.642788    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (564886)
Apr 12 10:28:14 node2 kubelet[7793]: W0412 10:28:14.693489    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (564992)
Apr 12 10:43:12 node2 kubelet[7793]: W0412 10:43:12.893306    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (566287)
Apr 12 10:43:37 node2 kubelet[7793]: W0412 10:43:37.662130    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (566320)
Hint: Some lines were ellipsized, use -l to show in full.

或者

[root@centos ~]# journalctl -xeu kubelet
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.510165    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.610691    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.711008    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.811468    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: I0412 10:46:53.883382    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.912065    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: I0412 10:46:53.914043    9787 kubelet_node_status.go:72] Attempting to register node centos.master
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.916659    9787 kubelet_node_status.go:94] Unable to register node "centos.master" with API se
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.012363    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.113003    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: I0412 10:46:54.147210    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.213291    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.313616    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.413970    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.514292    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.615167    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.715863    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.816154    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.916432    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.017040    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.117863    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.218694    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.319663    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.420254    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.521053    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.621575    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.722435    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.823464    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.924273    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.024392    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.125129    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: I0412 10:46:56.146767    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.225839    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.326354    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.427552    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.528289    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.628843    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.729056    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.829340    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.929690    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.030373    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.131158    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.232373    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.333084    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.433269    9787 kubelet.go:2266] node "centos.master" not found

上面的kubelet服務日誌告訴我centos.master 的node找不到。

 好了到這裏三板斧算是用完了。很簡單的三板斧,只能用於基本排查。

  如果某個k8s對象存在問題而查看系統服務的日誌,則我們可以用這個對象的名字作爲關鍵字來搜索日誌,在大多數情況下,我麼平常所遇到的主要是與pod對象相關的問題,比如無法創建pod,pod啓動後就停止或者Pod副本無法增加等。此時,我們可以先確定哪個pod在哪個節點上,然後登陸這個節點,從kubelet的日誌中查詢該pod的完整日誌,然後進行問題排查。對於與pod擴容相關或者與RC相關的問題,則很有可能在kjbe-controller-manager及Kube-scheduler的日誌中找出問題的關鍵點。

   另外kube-proxy經常被我們忽略,因爲就算他停了,pod的狀態依舊時正常的,但會導致某些服務訪問異常。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章