k8s集羣calico網絡故障記錄

報錯

calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
\\calico未準備好,BGP協議不能與172.16.0.20,172.16.0.30內網IP地址連接

BGP協議:邊界網關協議

訪問k8s的dashboard界面無法訪問網站,查看pod
未知原因導致calico的Pod資源重新創建後無法啓動,顯示的是0/1狀態

[root@k8s-master yaml]# kubectl get pod -n kube-system
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
...
kube-system            calico-kube-controllers-578894d4cd-rsgqd    1/1     Running   0          115d
kube-system            calico-node-64s8s                           1/1     Running   3          127d
kube-system            calico-node-j4t7q                           1/1     Running   0          127d
kube-system            calico-node-n6vr4                           0/1     Running   0          40s

Calico的Pod報錯內容

[root@k8s-master yaml]# kubectl describe pod -n kube-system calico-node-n6vr4
Events:
  Type     Reason     Age        From                 Message
  ----     ------     ----       ----                 -------
  Normal   Scheduled  <unknown>  default-scheduler    Successfully assigned kube-system/calico-node-n6vr4 to k8s-master
  Normal   Pulled     41s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Created    41s        kubelet, k8s-master  Created container upgrade-ipam
  Normal   Started    40s        kubelet, k8s-master  Started container upgrade-ipam
  Normal   Pulled     40s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Started    39s        kubelet, k8s-master  Started container install-cni
  Normal   Created    39s        kubelet, k8s-master  Created container install-cni
  Normal   Pulled     39s        kubelet, k8s-master  Container image "calico/pod2daemon-flexvol:v3.15.1" already present on machine
  Normal   Pulled     38s        kubelet, k8s-master  Container image "calico/node:v3.15.1" already present on machine
  Normal   Started    38s        kubelet, k8s-master  Started container flexvol-driver
  Normal   Created    38s        kubelet, k8s-master  Created container flexvol-driver
  Normal   Created    37s        kubelet, k8s-master  Created container calico-node
  Normal   Started    37s        kubelet, k8s-master  Started container calico-node
  Warning  Unhealthy  27s        kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:16:54.068 [INFO][142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  17s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:04.059 [INFO][181] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  7s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:14.065 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

原因:calico沒有發現實node節點實際的網卡名稱

解決方法

調整calicao的網絡插件的網卡發現機制,修改IP_AUTODETECTION_METHOD對應的value值。下載的官方提供的yaml文件中,ip識別策略(IPDETECTMETHOD)沒有配置,即默認爲first-found,這會導致一個網絡異常的ip作爲nodeIP被註冊,從而影響node之間的網絡連接。可以修改成can-reach或者interface的策略,嘗試連接某一個Ready的node的IP,以此選擇出正確的IP

# 修改calicao的yaml文件,添加兩行配置
# - name: IP_AUTODETECTION_METHOD
# value: "interface=eth1"  # 根據實際網卡名稱配置         
  
[root@k8s-master yaml]# vim calico.yaml
...(3546行)
            # Cluster type to identify the deployment type
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            #新添加的配置
            - name: IP_AUTODETECTION_METHOD
              value: "interface=eth1"
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            # Enable IPIP
            - name: CALICO_IPV4POOL_IPIP
              value: "Always"
            # Enable or Disable VXLAN on the default IP pool.
            - name: CALICO_IPV4POOL_VXLAN
              value: "Never"

#重新構建
kubectl apply -f calico.yaml

修復完成

[root@k8s-master yaml]# kubectl get pod -n kube-system 
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-rsgqd   1/1     Running   0          115d
calico-node-6ktn4                          1/1     Running   0          26m
calico-node-8k5z8                          1/1     Running   0          26m
calico-node-g87hc                          1/1     Running   0          1m

再次訪問集羣的各種資源已經可以訪問了
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章