問題描述
因k8s節點異常關機導致啓動後業務Pod重新部署,關機之前的Pod狀態已被刪除,今天在查看日誌時發現在異常關機之前的集羣節點Pod是非正常移除的,一直刷報錯信息;如下:
問題排查
查看系統日誌/var/log/messages
發現一直在刷kubectl
服務的以下的報錯,從錯誤信息可以看到,這臺節點存在一個孤兒Pod,並且該Pod掛載了數據卷(volume),阻礙了Kubelet
對孤兒Pod正常的回收清理。
[root@sss-010xl-n02 ~]# tail -3 /var/log/messages
Dec 12 17:50:17 sss-010xl-n02 bash[470923]: user=root,ppid=454652,from=,pwd=/var/lib/kubelet/pods,command:20211212-175006: ll
Dec 12 17:55:15 sss-010xl-n02 kubelet: E1212 17:55:15.645612 2423 kubelet_volumes.go:154] Orphaned pod "aad90ab1-2f04-11ec-b488-b4055dae3f29" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Dec 12 17:55:15 sss-010xl-n02 kubelet: E1212 17:55:15.645612 2423 kubelet_volumes.go:154] Orphaned pod "aad90ab1-2f04-11ec-b488-b4055dae3f29" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
通過pod_id
號,進入kubelet的目錄,可以發現裏面裝的是容器的數據,etc-hosts
文件中還保留着Pod_name
[root@sss-010xl-n02 ~]# cd /var/lib/kubelet/pods/aad90ab1-2f04-11ec-b488-b4055dae3f29
[root@sss-010xl-n02 pods]# cd aad90ab1-2f04-11ec-b488-b4055dae3f29/
[root@sss-010xl-n02 aad90ab1-2f04-11ec-b488-b4055dae3f29]# ll
total 4
drwxr-x--- 3 root root 30 Dec 10 15:54 containers
-rw-r--r-- 1 root root 230 Dec 10 15:54 etc-hosts
drwxr-x--- 3 root root 37 Dec 10 15:54 plugins
drwxr-x--- 5 root root 82 Dec 10 15:54 volumes
drwxr-x--- 3 root root 49 Dec 10 15:54 volume-subpaths
[root@sss-010xl-n02 7e1a3af8-598e-11ec-b488-b4055dae3f29]# cat etc-hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.30.128.2 sss-wanted-010xl-5945fb4885-7gz85 \\被孤立的Pod
解決問題
首先通過etc-hosts
文件的pod_name
發現已經沒有相關的實例在運行了,所以直接刪除pod的目錄即可
[root@sss-010xl-n02 7e1a3af8-598e-11ec-b488-b4055dae3f29]# cd ..
[root@sss-010xl-n02 pods]# rm -rf 7e1a3af8-598e-11ec-b488-b4055dae3f29/
網上看其他人的博客都說這個方法有一定的危險性,還不確認是否有數據丟失的風險,如果可以確認,再執行;如果是無狀態服務,一般沒有問題。
再去查看日誌,就不會再刷這樣的告警日誌了