當out of memory發生時,out_of_memory函數會選擇一個內核認爲犯有分配過多內存 “罪行”的進程,並殺死該進程。
這就有很大的機率騰出較多的空閒頁,然後再跳轉回重試內存分配的操作。
這裏我們不討論out_of_memory 函數流程 選擇要犧牲進程的策略方法。
我們僅討論out of memory發生時,內核輸出信息的含義。
1. OOM 信息
以下是一段典型的out of memory 內核輸出信息:
<4>[12345.342532] systemd-journal invoked oom-killer: gfp_mask=0x800d0, order=0, oom_score_adj=0
<4>[12345.351216] CPU: 1 PID: 1371 Comm: systemd-journal Tainted: G O 3.14.31-00017-g40fab71 #1
<4>[12345.360695] Backtrace:
<4>[12345.363263] [<c0012fcc>] (dump_backtrace) from [<c00131a4>] (show_stack+0x20/0x24)
<4>[12345.371192] r6:00000000 r5:ffffffff r4:00000000 r3:bd943631
<4>[12345.377136] [<c0013184>] (show_stack) from [<c07bbe78>] (dump_stack+0x7c/0xc8)
<4>[12345.384710] [<c07bbdfc>] (dump_stack) from [<c07ba7e4>] (dump_header.isra.14+0x74/0x188)
<4>[12345.393184] r6:000800d0 r5:00000000 r4:e8088000 r3:00000002
<4>[12345.399126] [<c07ba770>] (dump_header.isra.14) from [<c00f8a28>] (oom_kill_process+0x230/0x3e0)
<4>[12345.408234] r10:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:000800d0 r4:e9bb79c0
<4>[12345.416462] [<c00f87f8>] (oom_kill_process) from [<c00f90c8>] (out_of_memory+0x2f4/0x354)
<4>[12345.425024] r10:00000000 r9:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:c0b89d08
<4>[12345.433249] r4:c0b89aa8
<4>[12345.435903] [<c00f8dd4>] (out_of_memory) from [<c00fd6c8>] (__alloc_pages_nodemask+0x93c/0x988)
<4>[12345.445011] r10:00000000 r9:c0c38fc0 r8:c0b871d8 r7:e8088000 r6:c0c39bc0 r5:00000000
<4>[12345.453234] r4:000800d0
<4>[12345.455887] [<c00fcd8c>] (__alloc_pages_nodemask) from [<c00fd734>] (__get_free_pages+0x20/0x3c)
<4>[12345.465087] r10:e97d36a8 r9:00000063 r8:e8089f6c r7:00000063 r6:b6f79f68 r5:e97d36a8
<4>[12345.473311] r4:00000000
<4>[12345.475965] [<c00fd714>] (__get_free_pages) from [<c0196878>] (proc_pid_readlink+0x68/0x110)
<4>[12345.484808] [<c0196810>] (proc_pid_readlink) from [<c013dcb8>] (SyS_readlinkat+0xf0/0x104)
<4>[12345.493461] r7:bea40520 r6:ffffff9c r5:00004000 r4:00000000
<4>[12345.499402] [<c013dbc8>] (SyS_readlinkat) from [<c000eee0>] (ret_fast_syscall+0x0/0x34)
<4>[12345.507785] r10:00000000 r9:e8088000 r8:c000f148 r7:0000014c r6:00000063 r5:b6f79f68
<4>[12345.516011] r4:00000064
<4>[12345.518663] Mem-info:
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB
inactive_anon:10824kB active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:774144kB managed:387568kB mlocked:0kB dirty:16kB writeback:76kB
mapped:3296kB shmem:10840kB slab_reclaimable:8400kB slab_unreclaimable:105144kB kernel_stack:1168kB
pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB writeback_tmp:0kB pages_scanned:1648
all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB
inactive_anon:48336kB active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1301504kB managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB
mapped:101020kB shmem:48588kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
<4>[12345.714293] 66770 total pagecache pages
<4>[12345.718309] 0 pages in swap cache
<4>[12345.724832] Swap cache stats: add 0, delete 0, find 0/0
<4>[12345.730308] Free swap = 0kB
<4>[12345.733412] Total swap = 0kB
<4>[12345.747245] 520192 pages of RAM
<4>[12345.750577] 286253 free pages
<4>[12345.753778] 97924 reserved pages
<4>[12345.757258] 28061 slab pages
<4>[12345.760574] 115601 pages shared
<4>[12345.764283] 0 pages swap cached
<6>[12345.767572] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
<6>[12345.775906] [ 1366] 0 1366 459 125 3 0 0 sh
<6>[12345.785861] [ 1367] 0 1367 665 235 4 0 0 propertyd
<6>[12345.794802] [ 1368] 0 1368 26553 8835 58 0 0 seed
<6>[12345.803296] [ 1371] 0 1371 1648 772 5 0 0 systemd-journal
<6>[12345.812792] [ 1375] 0 1375 750 300 4 0 -1000 systemd-udevd
<6>[12345.822449] [ 2416] 1040 2416 3852 510 7 0 0 secd
<6>[12345.831341] [ 2419] 0 2419 6678 923 9 0 0 storagemanagerd
<6>[12345.840944] [ 2420] 0 2420 1267 497 5 0 0 connmand
<6>[12345.849566] [ 2422] 0 2422 4484 687 8 0 0 uuid
<6>[12345.857843] [ 2424] 0 2424 1161 358 5 0 0 connman-vpnd
<6>[12345.867271] [ 2427] 1000 2427 1593 461 6 0 0 logboxd
<6>[12345.875846] [ 2432] 0 2432 9483 1718 15 0 0 cmns
<6>[12345.884104] [ 2451] 81 2451 1355 474 4 0 -900 dbus-daemon
<6>[12345.893018] [ 2532] 0 2532 11794 246 10 0 0 adbd
<6>[12345.901304] [ 2535] 0 2535 1502 347 5 0 0 wpa_supplicant
<6>[12345.910473] [ 2536] 0 2536 12820 866 12 0 0 udisksd
<6>[12345.919119] [ 2537] 0 2537 1898 527 6 0 0 tyid
<6>[12345.927361] [ 2540] 0 2540 10076 2157 16 0 0 datamanagerd
<6>[12345.936349] [ 2554] 0 2554 5983 574 7 0 0 connectivityser
<6>[12345.945635] [ 2558] 0 2558 10604 5388 21 0 0 weston
<6>[12345.964101] [ 2589] 0 2589 14597 1917 17 0 0 pagemanagerd
<6>[12345.973272] [ 2590] 0 2590 3832 515 7 0 0 amt
<6>[12345.981730] [ 2593] 0 2593 6176 1343 12 0 0 weston-desktop-
<6>[12345.991046] [ 2599] 0 2599 7185 761 12 0 0 scim-launcher
<6>[12346.098925] [ 5580] 0 5580 458 116 3 0 0 sh
<6>[12346.107065] [ 5581] 0 5581 492 175 3 0 0 gzip
<3>[12346.115335] Out of memory: Kill process 5575 thread_x score 481 or sacrifice child
<3>[12346.124212] Killed process 5575 thread_x total-vm:106212kB, anon-rss:18036kB, file-rss:2704kB
2 OOM信息分析
2.1
<4>[12345.342532] systemd-journal invoked oom-killer: gfp_mask=0x800d0, order=0, oom_score_adj=0
systemd-joural: 當前進程爲systemd-journal,請求分配頁面時,引發了oom-killer
gfp_mask=0x800d0: 是alloc_page的GFP標誌,對於當前場景,代表___GFP_RECLAIMABLE | ___GFP_HIGH | ___GFP_IO | ___GFP_FS
order=0 : 表示alloc_page的order爲0, 也就是說僅請求1^0=1個頁面,
oom_score_adj=0: 表明這個進程被殺的機率, oom_score_adj取值0(never kill)~1000(always kill)
<4>[12345.351216] CPU: 1 PID: 1371 Comm: systemd-journal Tainted: G O 3.14.31-00017-g40fab71 #1
<4>[12345.360695] Backtrace:
<4>[12345.363263] [<c0012fcc>] (dump_backtrace) from [<c00131a4>] (show_stack+0x20/0x24)
<4>[12345.371192] r6:00000000 r5:ffffffff r4:00000000 r3:bd943631
<4>[12345.377136] [<c0013184>] (show_stack) from [<c07bbe78>] (dump_stack+0x7c/0xc8)
<4>[12345.384710] [<c07bbdfc>] (dump_stack) from [<c07ba7e4>] (dump_header.isra.14+0x74/0x188)
<4>[12345.393184] r6:000800d0 r5:00000000 r4:e8088000 r3:00000002
<4>[12345.399126] [<c07ba770>] (dump_header.isra.14) from [<c00f8a28>] (oom_kill_process+0x230/0x3e0)
<4>[12345.408234] r10:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:000800d0 r4:e9bb79c0
<4>[12345.416462] [<c00f87f8>] (oom_kill_process) from [<c00f90c8>] (out_of_memory+0x2f4/0x354)
<4>[12345.425024] r10:00000000 r9:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:c0b89d08
<4>[12345.433249] r4:c0b89aa8
<4>[12345.435903] [<c00f8dd4>] (out_of_memory) from [<c00fd6c8>] (__alloc_pages_nodemask+0x93c/0x988)
<4>[12345.445011] r10:00000000 r9:c0c38fc0 r8:c0b871d8 r7:e8088000 r6:c0c39bc0 r5:00000000
<4>[12345.453234] r4:000800d0
<4>[12345.455887] [<c00fcd8c>] (__alloc_pages_nodemask) from [<c00fd734>] (__get_free_pages+0x20/0x3c)
<4>[12345.465087] r10:e97d36a8 r9:00000063 r8:e8089f6c r7:00000063 r6:b6f79f68 r5:e97d36a8
<4>[12345.473311] r4:00000000
<4>[12345.475965] [<c00fd714>] (__get_free_pages) from [<c0196878>] (proc_pid_readlink+0x68/0x110)
<4>[12345.484808] [<c0196810>] (proc_pid_readlink) from [<c013dcb8>] (SyS_readlinkat+0xf0/0x104)
<4>[12345.493461] r7:bea40520 r6:ffffff9c r5:00004000 r4:00000000
<4>[12345.499402] [<c013dbc8>] (SyS_readlinkat) from [<c000eee0>] (ret_fast_syscall+0x0/0x34)
<4>[12345.507785] r10:00000000 r9:e8088000 r8:c000f148 r7:0000014c r6:00000063 r5:b6f79f68
<4>[12345.516011] r4:00000064
dump_header->dump_stack的輸出的引發OOM的調用函數棧,從ret_fast_syscall開始dump_backtrace。
通過這段輸出,可以推測systemd-journal調用readlink系統調用時,引發的一次分頁操作,導致了OOM。
2.2
<4>[12345.518663] Mem-info:
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB
inactive_anon:10824kB active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:774144kB managed:387568kB mlocked:0kB dirty:16kB writeback:76kB
mapped:3296kB shmem:10840kB slab_reclaimable:8400kB slab_unreclaimable:105144kB kernel_stack:1168kB
pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB writeback_tmp:0kB pages_scanned:1648
all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB
inactive_anon:48336kB active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1301504kB managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB
mapped:101020kB shmem:48588kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
<4>[12345.714293] 66770 total pagecache pages
<4>[12345.718309] 0 pages in swap cache
<4>[12345.724832] Swap cache stats: add 0, delete 0, find 0/0
<4>[12345.730308] Free swap = 0kB
<4>[12345.733412] Total swap = 0kB
<4>[12345.747245] 520192 pages of RAM
<4>[12345.750577] 286253 free pages
<4>[12345.753778] 97924 reserved pages
<4>[12345.757258] 28061 slab pages
<4>[12345.760574] 115601 pages shared
<4>[12345.764283] 0 pages swap cached
dump_header->show_mem輸出當前系統內存信息。
2.2.1
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
每個內存管理區定義了一個“每CPU”頁框高速緩存,所有“每CPU”高速緩存包含一些預先分配的頁框,它們被用於滿足本地CPU 發出的單個頁內存請求。
CPU 0: hi: 42, btch: 7 usd: 23
表示 CPU 0,
hi: 42 表示上限值,超過這個數字,則釋放batch個頁框到buddy系統中
btch: 7 表示向高速緩存添加或者刪除頁框時,頁框塊的大小
usd: 23 頁框高速緩存中的頁框數目
2.2.2
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
active_anon: 活動的匿名映射,"活動"是指最近被訪問過,"匿名"則指頁面映射不與任何數據源相關
inactive_anon: 非活動的匿名映射
isolated_anon: DON'T KNOW
active_file: 活動的文件映射,頁面映射和磁盤文件相關聯
inactive_file: 非活動的文件映射
isolated_file: DON'T KNOW
unevictable:
dirty: 髒頁面,表示頁面的內容和快設備上的原始內容已經不一至
writeback: 當前頁面正處在回寫狀態
unstable:
free: 空閒頁面
slab_relaimable: slab cache中可回收的頁面
slab_unreclaimable: slab cache中不可以回收的頁面
mapped: BH_MAPPED,表示這個頁面被用做快設備的buffer映射,注意這個映射不同於anon和file映射。
shmem: 用於共享內存映射的頁面
pagetable: 頁表佔用的頁面,也就是PTE PTD佔用的頁面數目
bounce:
free_cma: continuous memory allocator的空閒頁面。
2.2.3
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB inactive_anon:10824kB
active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:774144kB
managed:387568kB mlocked:0kB dirty:16kB writeback:76kB mapped:3296kB shmem:10840kB slab_reclaimable:8400kB
slab_unreclaimable:105144kB kernel_stack:1168kB pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB
writeback_tmp:0kB pages_scanned:1648 all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB inactive_anon:48336kB
active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1301504kB
managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB mapped:101020kB shmem:48588kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
Normal free: Normal zone的空閒空間
min, low, high是normal zone執行頁面置換的幾個水印
lowmem_reserve: 表示該分zone爲其他zone預留的可分配頁面數
present: 表示zone的物理內存大小
managed: 是buddy系統管理的present內存大小,managed = preset - reserved
其他值可參考2.2.2節,除了數值代表Normal zone,其他含義類似。
注意1,有幾項是Normal特有的,比如kernel_stack, pagetables, free_cma, slab_reclaimable, slab_unreclaimable,是因爲normal zone的頁面是直接映射,這些頁面是供內核中使用的。
對於highmem,主要用來匿名映射,文件映射,mapped,以及共享內存。
2.2.4
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
buddy系統信息信息, order範圍0~11
M表示 moveable
R表示 Reserve
C表示 CMA
U表示 unmovable
E表示 reclaimable
1. 僅有 (C),表示這個freelist只能分配給帶有ALLOC_CMA標誌的分配
2. Highmem沒有C標記,這是因爲連續內存分配只發生在Normal zone中
3. MRC,表示這個freelist既有CMA內存,Reserve內存還有Movable內存
3 Who triggered OOM
有幾個因素影響OOM的發生
1. 分配的order大小,以及系統對待order的方式
2. 分配發生在哪個zone
3. Zone的水印大小
4. 內存碎片化程度
5. 據說不停的分配地址空間,也會導致OOM的發生(還沒驗證過)
對於上面的OOM信息,我們可以看到系統內有很大的空閒空間:233460KB,但是OOM仍然發生了。
首先分配的order爲0,所以和碎片化是無關的,gfp_mask=0x800d0說明分配發生在Normal分區,並且類型爲Reclaimable,Reclaimable也就意味着無法從CMA分配內存。
既然不是order過大導致分配失敗,那麼就是free空間小於內存水印min導致了OOM killer。
static bool __zone_watermark_ok(struct zone *z, unsigned int order,
unsigned long mark, int classzone_idx, int alloc_flags,
long free_pages)
{
...
#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
if (!(alloc_flags & ALLOC_CMA))
free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
return false;
...
}
分配類型爲Reclaimable,導致free空間必須減去CMA空閒空間233460KB - 223116kB = 2352KB,小於min 水印,系統啓動OOM killer