soft lockup - CPU#9 stuck for 105s! [vmmemctl:838]

【故障現象】

某平臺生產服務器突然欠費人員通知portal無法登錄,檢查數據庫發現其ip地址ping不通,數據庫mysql的端口也telnet不通,判斷數據庫主機發生宕機故障,協調主機運維人員,發現後臺主機黑屏,報:軟鎖故障,需要重啓。
重啓後,查看故障mysql主機日誌發現有如下報錯:

Mar 19 13:15:50 localhost kernel: BUG: soft lockup - **CPU#9 stuck for 105s!** [**vmmemctl:838**]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - **CPU#2 stuck for 99s!** [mysqld:754]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#4 stuck for 99s! [mysqld:8773]
Mar 19 13:15:50 localhost kernel: inet_diag
Mar 19 13:15:50 localhost kernel: Modules linked in: vsock(U) iptable_filter ipt_REJECT ip_tables ip_vs ip6t_REJECT libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) nf_conntrack_ipv6 i2c_piix4 nf_defrag_ipv6 i2c_core xt_state shpchp nf_conntrack ext4 ip6table_filter jbd2 ip6_tables ipv6 mbcache ppdev sd_mod parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#8 stuck for 99s! [ps:29030]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#12 stuck for 99s! [mrtg:28880]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#13 stuck for 100s! [mysqld:22619]
Mar 19 13:15:50 localhost kernel: iptable_filter
Mar 19 13:15:50 localhost kernel: Modules linked in: ip_tables iptable_filter ip_vs ip_tables libcrc32c tcp_diag ip_vs inet_diag libcrc32c vsock(U) ipt_REJECT tcp_diag ip6t_REJECT inet_diag nf_conntrack_ipv6 vsock(U) ipt_REJECT nf_defrag_ipv6 ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc ppdev parport_pc parport vmware_balloon parport sg vmware_balloon vmci(U) sg vmci(U) i2c_piix4 i2c_core shpchp ext4 i2c_piix4 i2c_core shpchp jbd2 mbcache ext4 sd_mod jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#5 stuck for 99s! [mysqld:25691]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#11 stuck for 100s! [sh:29031]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#7 stuck for 111s! [events/7:74]
Mar 19 13:15:50 localhost kernel: crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 2 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables crc_t10dif ip_vs sr_mod libcrc32c cdrom tcp_diag vmxnet3 inet_diag mptspi vsock mptscsih(U) mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 5 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 13 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [lastunloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 13 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last
 unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 4 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT ipt_REJECT nf_conntrack_ipv6 ip6t_REJECT nf_defrag_ipv6 nf_conntrack_ipv6 xt_state nf_defrag_ipv6 nf_conntrack xt_state ip6table_filter crc_t10dif crc_t10dif crc_t10dif sr_mod sr_mod cdrom sr_mod cdrom vmxnet3 mptspi cdrom vmxnet3 mptscsih mptspi mptscsih mptbase scsi_transport_spi ip6_tables ipv6 ppdev pata_acpi mptbase vmxnet3 mptspi mptscsih nf_conntrack
Mar 19 13:15:50 localhost kernel: Modules linked in: parport_pc iptable_filter parport ip6table_filter ip_tables ip6_tables ipv6 ppdev parport_pc parport vmware_balloon ip_vs vmware_balloon sg vmci(U) libcrc32c sg vmci(U) i2c_piix4 tcp_diag scsi_transport_spi pata_acpi i2c_piix4 i2c_core inet_diag mptbase scsi_transport_spi pata_acpi ata_generic shpchp i2c_core shpchp vsock(U) ext4 ata_generic ata_piix ata_generic ata_piix dm_mirror dm_region_hash ata_piix dm_mirror dm_region_hash dm_log dm_log dm_mirror dm_mod [last unloaded: nf_defrag_ipv4] ipt_REJECT jbd2 ip6t_REJECT mbcache dm_region_hash
Mar 19 13:15:50 localhost kernel: dm_modCPU 8 
Mar 19 13:15:50 localhost kernel: Modules linked in: dm_log dm_mod nf_conntrack_ipv6 sd_mod ext4 jbd2 mbcache nf_defrag_ipv6 sd_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: crc_t10dif xt_state nf_conntrack ip6table_filter iptable_filter crc_t10dif [last unloaded: nf_defrag_ipv4]CPU 12 
Mar 19 13:15:50 localhost kernel: Modules linked in:
Mar 19 13:15:50 localhost kernel: ip_tables ip6_tables ipv6 ppdev parport_pc parport vmware_balloon iptable_filter ip_tables ip_vs ip_vs libcrc32c tcp_diag sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 sr_mod mbcache sd_mod crc_t10dif sr_mod cdrom sr_mod cdrom vmxnet3 vmxnet3 mptspi cdrom mptspi inet_diagCPU 11 
Mar 19 13:15:50 localhost kernel: 
Mar 19 13:15:50 localhost kernel: pata_acpi ata_generic
Mar 19 13:15:50 localhost kernel: Pid: 29031, comm: sh Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: RIP: 0010:[<ffffffff8114862f>]  [<ffffffff8114862f>] unmap_vmas+0x6df/0xc50
Mar 19 13:15:50 localhost kernel: scsi_transport_spi
Mar 19 13:15:50 localhost kernel: RSP: 0018:ffff8808005a1928  EFLAGS: 00010246
Mar 19 13:15:50 localhost kernel: RAX: ffffea006ee92b78 RBX: ffff8808005a1a58 RCX: ffffea006e945898
Mar 19 13:15:50 localhost kernel: RDX: ffffea0000000000 RSI: ffff8810788cebc0 RDI: 8000001fb0559025
Mar 19 13:15:50 localhost kernel: RBP: ffffffff8100bb8e R08: ffff88099841fc78 R09: 0000000000000000
Mar 19 13:15:50 localhost kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000002f797d0d
Mar 19 13:15:50 localhost kernel: ata_piix ipt_REJECT pata_acpi ip6t_REJECT dm_mirror dm_region_hash dm_log dm_mod nf_conntrack_ipv6 ata_generic ata_piix dm_mirror nf_defrag_ipv6
Mar 19 13:15:50 localhost kernel: R13: 000000005e72ff7c R14: 000000002f797d0d R15: 000000005e72ff7c
Mar 19 13:15:50 localhost kernel: dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: FS:  0000000000000000(0000) GS:ffff8810788c0000(0000) knlGS:0000000000000000
Mar 19 13:15:50 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 19 13:15:50 localhost kernel: CR2: 00007fdf9e5bd48e CR3: 000000128d3c3000 CR4: 00000000000407e0
Mar 19 13:15:50 localhost kernel: CPU 9 
Mar 19 13:15:50 localhost kernel: Modules linked in: xt_state
Mar 19 13:15:50 localhost kernel: Pid: 28880, comm: mrtg Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
Mar 19 13:15:50 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:50 localhost kernel: nf_conntrack
Mar 19 13:15:50 localhost kernel: Pid: 29030, comm: ps Not tainted 2.6.32-431.el6.x86_64 #1 iptable_filter/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: ip6table_filter ip_tables ip6_tables VMware, Inc. VMware Virtual Platform ip_vs/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: RIP: 0033:[<00007f15c85d5683>]  ipv6
Mar 19 13:15:50 localhost kernel: RIP: 0010:[<ffffffff8122e4e0>]  libcrc32c ppdev [<00007f15c85d5683>] 0x7f15c85d5683
Mar 19 13:15:50 localhost kernel: RSP: 002b:00007ffff0f22fe0  EFLAGS: 00000206
Mar 19 13:15:51 localhost kernel: <d> ffff880abfb64100 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 [last unloaded: nf_defrag_ipv4] dm_log i
pt_REJECT
Mar 19 13:15:51 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 13:15:51 localhost kernel: CR2: 0000000001d3c5a8 CR3: 0000001f2f4d7000 CR4: 00000000000407e0
Mar 19 13:15:51 localhost kernel: ffffffff00000000 ffff8808005a1a78 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
 ppdev crc_t10dif
Mar 19 13:15:51 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
……
Mar 19 13:15:53 localhost kernel: Call Trace:
Mar 19 13:15:53 localhost kernel: mbcache tcp_diag
Mar 19 13:15:53 localhost kernel: Pid: 8773, comm: mysqld Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:15:53 localhost kernel: Pid: 754, comm: mysqld Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
Mar 19 13:15:53 localhost kernel: sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic [<ffffffff81185b8d>] ? filp_close+0x5d/0x90
Mar 19 13:15:53 localhost kernel: ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
……Pid: 29035, comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
/440BX Desktop Reference Platform
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: RIP: 0033:[<00000000010db503>]  ffff8800283d9548 ffff881029af7ee0 [<00000000010db503>] 0x10db503
Mar 19 13:15:54 localhost kernel: RSP: 002b:00007f68d25af000  EFLAGS: 00000206
Mar 19 13:15:54 localhost kernel: RAX: 00007f661cf22d8e RBX: 00007f68d25af030 RCX: 0000000000000004
Mar 19 13:15:54 localhost kernel: RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f661cf22d8c
Mar 19 13:15:54 localhost kernel: RBP: ffffffff8100bb8e R08: 0000000000000017 R09: 00007f66546881d8
Mar 19 13:15:54 localhost kernel: R10: 00007f7bf3f1050f R11: 0000000000000100 R12: 00007f68d25af090
Mar 19 13:15:54 localhost kernel: Pid: 29035, comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
/440BX Desktop Reference Platform
Mar 19 13:15:54 localhost kernel: RIP: 0010:[<ffffffff8112d46a>]  [<ffffffff8112d46a>] get_page_from_freelist+0x2da/0x870
Mar 19 13:15:54 localhost kernel: RSP: 0018:ffff8814ff289b40  EFLAGS: 00000246
Mar 19 13:15:54 localhost kernel: RAX: 0000000000000064 RBX: ffff8814ff289c60 RCX: 0000000000000013
Mar 19 13:15:54 localhost kernel: RDX: ffff881029904a40 RSI: 000000000000001b RDI: 0000000000000246
Mar 19 13:15:54 localhost kernel: RBP: ffffffff8100bb8e R08: 0000000000000064 R09: 000000000005733e
Mar 19 13:15:54 localhost kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff00000001
Mar 19 13:15:54 localhost kernel: R13: 0000000000000058 R14: ffffea002d4a6278 R15: ffffffff8112ba93
Mar 19 13:15:54 localhost kernel: ffffffff81094d20
Mar 19 13:15:54 localhost kernel: FS:  00007f13f656e7c0(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
Mar 19 13:15:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: R13: 00007f65de084e18 R14: 0000000000000031 R15: 00007f661cf201d5
Mar 19 13:15:54 localhost kernel: FS:  00007f68d25b2700(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
Mar 19 13:15:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 13:15:54 localhost kernel: CR2: 0000000000448000 CR3: 0000001e1f1bd000 CR4: 00000000000407e0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:54 localhost kernel: [<ffffffff81171950>] ? cache_reap+0x0/0x250
Mar 19 13:15:54 localhost kernel: 6a 
Mar 19 13:15:54 localhost kernel: [<ffffffff81142069>] ? refresh_cpu_vm_stats+0x159/0x180
Mar 19 13:15:54 localhost kernel: 3b 46 
Mar 19 13:15:54 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:54 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 19 13:15:54 localhost kernel: Process keepalived (pid: 29035, threadinfo ffff8814ff288000, task ffff881831190aa0)
Mar 19 13:15:54 localhost kernel: Stack:
Mar 19 13:15:54 localhost kernel: 00000001ffafffac [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: Process mysqld (pid: 25691, threadinfo ffff880362442000, task ffff881672a2b500)
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: ffff881c00000000 ffff882026fb1e50
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: ffff8814ff289b88
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: <d> ffff8814ff289e08
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: ffff88062498f025
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: Code: 04 41 89 06 7d 64 4c 89 ef 57 9d <0f> 1f 44 00 00 48 8b 5d d8 4c  0000000000000000
Mar 19 13:15:54 localhost kernel: <d> 00000040fffffffe35 01 00 00 c7 43 60 00 00 
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: 00 00 e8 
Mar 19 13:15:54 localhost kernel: [<ffffffff81142090>] ? vmstat_update+0x0/0x40
Mar 19 13:15:54 localhost kernel: 8b 65 e0 4c 8b 6d  0000000000000000e8 4c 8b 75 f0 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff8100be2e>] ? reschedule_interrupt+0xe/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8112ff2f>] ? free_hot_page+0x2f/0x60
Mar 19 13:15:54 localhost kernel: [<ffffffff8112ffc0>] ? __free_pages+0x60/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffffa0023189>] ? vmballoon_pop+0x59/0x90 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffffa00232f0>] ? vmballoon_work+0x0/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffffa0023388>] ? vmballoon_work+0x98/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffffa00232f0>] ? vmballoon_work+0x0/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: a9  [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: ffff88000016234868  0000000227d08a803b 00 49 8b 84 24 58 80 00 00 48 3d d0 da fc 81 4c 8d a0 a8 7f 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: ff ff 
Mar 19 13:15:54 localhost kernel: 0f  [<ffffffff8111fa47>] ? unlock_page+0x27/0x30
Mar 19 13:15:54 localhost kernel: 84  [<ffffffff811420a6>] ? vmstat_update+0x16/0x40
Mar 19 13:15:54 localhost kernel: a8  [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: 00 83 6d 80 01 
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: 86 00 00 00 <4b> 8b 5c fc 08 65 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: 48 8b 04 
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114856f>] ? unmap_vmas+0x61f/0xc50
Mar 19 13:15:54 localhost kernel: 25 d0 e0 00 00 4a 8b 14 30 48 8b 43 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffff81171950>] ? cache_reap+0x0/0x250
Mar 19 13:15:54 localhost kernel: [<ffffffff8114a499>] ? __do_fault+0x469/0x530
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8112f3a3>] ? __alloc_pages_nodemask+0x113/0x8d0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8114e477>] ? exit_mmap+0x87/0x170
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8106f22c>] ? mmput+0x6c/0x120
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: Code: 00 00 49 89 c4  [<ffffffff81190aa4>] ? flush_old_exec+0x484/0x690
Mar 19 13:15:54 localhost kernel: fa 66 0f 1f 44 00 00 44 8b 2e 44 39 6e 08 48 89 f2 44 0f 4e 6e 08 44 89 ee e8 2c ef ff ff 44 29  [<ffffffff811e45c0>] ? load_elf_binary+0x350/0x1ab0
Mar 19 13:15:54 localhost kernel: 2b 4c 89 e7 57 9d <0f> 1f 44 00 00 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 90 55 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8118d0c5>] ? chrdev_open+0x125/0x230
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff812317b1>] ? selinux_dentry_open+0xe1/0x140
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110
Mar 19 13:15:54 localhost kernel: [<ffffffff81142069>] ? refresh_cpu_vm_stats+0x159/0x180
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffff81188dba>] ? do_sync_read+0xfa/0x140
Mar 19 13:15:54 localhost kernel: [<ffffffff81142090>] ? vmstat_update+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff811420a6>] ? vmstat_update+0x16/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8112cf3e>] ? __get_free_pages+0xe/0x50
Mar 19 13:15:54 localhost kernel: [<ffffffff811a3b9a>] ? dput+0x9a/0x150
Mar 19 13:15:54 localhost kernel: [<ffffffff811e186e>] ? load_misc_binary+0x9e/0x3f0
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8106fa46>] ? copy_process+0x126/0x1450
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff812334eb>] ? selinux_file_permission+0xfb/0x150
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
Mar 19 13:15:54 localhost kernel: [<ffffffff81070e11>] ? do_fork+0xa1/0x480
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff811920b7>] ? search_binary_handler+0x137/0x370
Mar 19 13:15:54 localhost kernel: [<ffffffff8128ed8b>] ? strncpy_from_user+0x5b/0x90
Mar 19 13:15:54 localhost kernel: [<ffffffff811910b6>] ? kernel_read+0x46/0x60
Mar 19 13:15:54 localhost kernel: [<ffffffff810894b7>] ? do_sigaction+0x197/0x1d0
Mar 19 13:15:54 localhost kernel: [<ffffffff81009598>] ? sys_clone+0x28/0x30
Mar 19 13:15:54 localhost kernel: [<ffffffff811e2a77>] ? load_script+0x267/0x2b0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114b829>] ? get_user_pages+0x49/0x50
Mar 19 13:15:54 localhost kernel: [<ffffffff8100b393>] ? stub_clone+0x13/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Mar 19 13:15:54 localhost kernel: Code: 06  [<ffffffff811917ac>] ? get_arg_page+0x5c/0x100

Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#9 stuck for 105s! [vmmemctl:838]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#2 stuck for 99s! [mysqld:754]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU**#4 stuck for 99s! [mysqld:8773]**
Mar 19 13:15:50 localhost kernel: inet_diag
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#8 stuck for 99s! [ps:29030]
comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
Mar 19 13:15:59 localhost rsyslogd-2177: imuxsock lost 34 messages from pid 2643 due to rate-limiting
Mar 19 13:16:34 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) timed out
Mar 19 13:16:34 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) succeeded
Mar 19 13:16:36 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) timed out
Mar 19 13:16:36 localhost Keepalived_vrrp[2643]: Process [29331] didn’t respond to SIGTERM
在這裏插入圖片描述
在這裏插入圖片描述
Mar 19 13:39:03 localhost kernel: INFO: task mysqld:30584 blocked for more than 120 seconds.
Mar 19 13:39:03 localhost kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:39:03 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:03 localhost kernel: mysqld R running task 0 30584 30106 0x00000080

Mar 19 13:39:05 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:05 localhost kernel: crond D 0000000000000003 0 2871 1762 0x00000080

##【分析】
相關網絡資料對此解釋爲:
Mar 19 13:39:03 localhost kernel: INFO: task mysqld:30584 blocked for more than 120 seconds.
Mar 19 13:39:03 localhost kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:39:03 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:03 localhost kernel: mysqld R running task 0 30584 30106 0x00000080
後臺對進行的任務由於120s超時而掛起,linux會設置40%的可用內存用來做系統cache,當flush數據時這40%內存中的數據由於和IO同步問題導致超時(120s),所將40%減小到10%,避免超時。簡單來說;一般情況下Linux寫磁盤時會用到緩存,這個緩存大概是內存的40%,只有當這個緩存差不多用光時,系統纔會將緩存中的內容同步寫到磁盤中。但是操作系統對這個同步過程有一個時間限制,就是120秒。如果系統IO比較慢,在120秒內搞不定,那就會出現這個異常。這通常發生在內存很大的系統上。

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle.
I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf.
This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.
檢查系統內核當前設置爲:
#sysctl -a|grep dirty
調整內核參數:
#調整緩存佔內存的比例
  sysctl -w vm.dirty_ratio=10
  sysctl -w vm.dirty_background_ratio=5
  #修改系統的IO調度策略,使用noop的方式,這是一種基於FIFO的最簡 單的調度方式
  echo noop > /sys/block/sda/queue/scheduler
  
/sbin/sysctl -w kernel.hung_task_timeout_secs = 0

sysctl -p

重啓後如想繼續生效,需添加至內核文件:

vi /etc/sysctl.conf

vm.dirty_background_ratio = 5

vm.dirty_ratio = 10
【注意】
vm.dirty_background_ratio:這個參數指定了當文件系統緩存髒頁數量達到系統內存百分之多少時(如5%)就會觸發pdflush/flush/kdmflush等後臺回寫進程運行,將一定緩存的髒頁異步地刷入外存;

vm.dirty_ratio:而這個參數則指定了當文件系統緩存髒頁數量達到系統內存百分之多少時(如10%),系統不得不開始處理緩存髒頁(因爲此時髒頁數量已經比較多,爲了避免數據丟失需要將一定髒頁刷入外存);在此過程中很多應用進程可能會因爲系統轉而處理文件IO而阻塞。
在這裏插入圖片描述
進程等待IO時,經常處於D狀態,即TASK_UNINTERRUPTIBLE狀態,處於這種狀態的進程不處理信號,所以kill不掉,如果進程長期處於D狀態,那麼肯定不正常,
原因可能有二
1)IO路徑上的硬件出問題了,比如硬盤壞了(只有少數情況會導致長期D,通常會返回錯誤)
2)內核自己出問題了
這種問題不好定位,而且一旦出現就通常不可恢復,kill不掉,通常只能重啓恢復了。
內核針對這種開發了一種hung task的檢測機制。
基本原理是:定時檢測系統中處於D狀態的進程,如果其處於D狀態的時間超過了指定時間(默認120s,可以配置),則打印相關堆棧信息,也可以通過proc參數配置使其直接panic。

對於報錯:NET: Registered protocol family 36
網上相關資料解釋:由於Linux運行在VMWare虛擬化環境下,安裝了vmware-tools,將vmware-tools從10.0.5.-1 升級到10.1.0 就會解決這個問題。
#vmware-toolbox-cmd -v ##檢查vmtools版本
#cat /etc/protocols //查看協議Registered protocol family 36爲xtp協議,

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章