docker 導致宿主機重啓的解決方法

宿主機操作系統爲centos 7.4

在k8s運行一段時間報錯:

containerd: time="2019-12-19T21:50:49.070815105Z" level=info msg="shim reaped" id=6bdd3fe50ae41e731e7483e939612792d6c752ca0437525dc89103abacf22a8d
dockerd: time="2019-12-19T21:50:49.080258760Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
NetworkManager[2115]: <info>  [1573249862.2223] device (calief3c22d1ca1): driver 'veth' does not support carrier detection.
containerd: time="2019-12-19T21:51:02.363334433Z" level=info msg="shim reaped" id=a86dab3a213d7adafed6cab2238ad8c389b35450cc74cce6bcc203bd2ef86bdf
dockerd: time="2019-12-19T21:51:02.372621948Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
containerd: time="2019-12-19T21:51:02.787315148Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c/shim.sock" debug=false pid=400582
kernel: nf_conntrack: falling back to vmalloc.
kernel: nf_conntrack: falling back to vmalloc.
kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
kernel: runc:[2:INIT] cpuset=c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c mems_allowed=0-3
kernel: CPU: 19 PID: 400674 Comm: runc:[2:INIT] Tainted: G               ------------ T 3.10.0-514.el7.x86_64 #1
kernel: Hardware name: Inspur NF8480M5/YZMB-00866-102, BIOS 4.1.09 09/23/2019
kernel: ffff8820223e8000 00000000c9677d19 ffff88201421fcc0 ffffffff81685fac
kernel: ffff88201421fd50 ffffffff81680f57 0000000000000000 00000000000000d0
kernel: ffff88201421fd20 ffffffff811f155e 0000000000000000 ffffffff81184156
kernel: Call Trace:
kernel: [<ffffffff81685fac>] dump_stack+0x19/0x1b
kernel: [<ffffffff81680f57>] dump_header+0x8e/0x225
kernel: [<ffffffff811f155e>] ? mem_cgroup_reclaim+0x4e/0x120
kernel: [<ffffffff81184156>] ? find_lock_task_mm+0x56/0xc0
kernel: [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
kernel: [<ffffffff810936ce>] ? has_capability_noaudit+0x1e/0x30
kernel: [<ffffffff811f2fd1>] mem_cgroup_oom_synchronize+0x551/0x580
kernel: [<ffffffff811f2420>] ? mem_cgroup_charge_common+0xc0/0xc0
kernel: [<ffffffff81184e94>] pagefault_out_of_memory+0x14/0x90
kernel: [<ffffffff8167ed47>] mm_fault_error+0x68/0x12b
kernel: [<ffffffff81691cd5>] __do_page_fault+0x395/0x450
kernel: [<ffffffff81691dc5>] do_page_fault+0x35/0x90
kernel: [<ffffffff8168e088>] page_fault+0x28/0x30
kernel: Task in /kubepods/pod7879bd10-0265-11ea-a2be-6c92bff19a9a/c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c killed as a result of limit of /kubepods/pod7879bd10-0265-11ea-a2be-6c92bff19a9a
kernel: memory: usage 20480kB, limit 20480kB, failcnt 336
kernel: memory+swap: usage 20480kB, limit 9007199254740988kB, failcnt 0
kernel: kmem: usage 16860kB, limit 9007199254740988kB, failcnt 0
kernel: Memory cgroup stats for /kubepods/pod7879bd10-0265-11ea-a2be-6c92bff19a9a: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
kernel: Memory cgroup stats for /kubepods/pod7879bd10-0265-11ea-a2be-6c92bff19a9a/c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c: cache:0KB rss:3620KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:3576KB inactive_file:0KB active_file:0KB unevictable:0KB
kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
kernel: [400674]     0 400674    14361     1312      21        0          -998 runc:[2:INIT]
kernel: Memory cgroup out of memory: Kill process 400678 (runc:[2:INIT]) score 0 or sacrifice child
kernel: Killed process 400674 (runc:[2:INIT]) total-vm:57444kB, anon-rss:3616kB, file-rss:1632kB, shmem-rss:0kB
containerd: time="2019-12-19T21:51:12.766784752Z" level=info msg="shim reaped" id=c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c
dockerd: time="2019-12-19T21:51:12.776459692Z" level=error msg="stream copy error: reading from a closed fifo"
dockerd: time="2019-12-19T21:51:12.776493528Z" level=error msg="stream copy error: reading from a closed fifo"
dockerd: time="2019-12-19T21:51:12.868654737Z" level=error msg="c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c cleanup: failed to delete container from containerd: no such container"
dockerd: time="2019-12-19T21:51:12.868760863Z" level=error msg="Handler for POST /v1.38/containers/c04ddba2ccfbbb317c96f65f4c9f1b555f2a62f3b31f8d99ed5463833dc2230c/start returned error: OCI runtime create failed: container_linux.go:345: starting container process caused \"process_linux.go:424: container init caused \\\"read init-p: connection reset by peer\\\"\": unknown"
dockerd: time="2019-12-19T21:51:14.033706156Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
dockerd: time="2019-12-19T21:51:14.033740242Z" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
containerd: time="2019-12-19T21:51:14.100469099Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/ba2deb094acba8f412569ae40a841c8895d3292b122a0c594114db93c7d8ae54/shim.sock" debug=false pid=400774
kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)

並導致服務器重啓,

 

使用uname -a,發現內核版本爲 3.10.0-957.el7.x86_64

使用以下兩種方法都可以解決上述問題,推薦方法2.

 

解決方法1:

修改/etc/docker/daemon.json爲:

{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

之後重啓docker服務,執行docker info|grep Cgroup,發現結果爲systemd(默認是cgroupfs),即可。

 

解決方法2:

#升級docker版本
yum remove docker docker-engine docker-common \
docker-client docker-client-latest docker-latest docker-latest-logrotate \
docker-logrotate docker-selinux docker-engine-selinux  -y
yum install yum-utils lvm2 device-mapper-persistent-data -y
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --disable docker-ce-edge docker-ce-test
yum install docker-ce.x86_64 -y
yum update containerd.io -y

#升級內核版本
yum update kernel.x86_64 -y

之後會提示安裝kernel-3.10.0-1062.9.1.el7.x86_64,安裝完成後重啓服務器,再使用uname -a可以看到內核版本號升級爲3.10.0-1062.4.3.el7.x86_64,問題解決

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章