linux 內核調試(1) - coredump

  • 學習kernel 調試方法

1.內核轉儲

  內核轉儲(coredump)保存了進程某一時刻的運行狀態,它在進程發生問題時產生,此時只要有程序的可執行文件和 coredump 即可對其進行調試,瞭解產生 coredump 那一刻進程的狀態,從而發現問題點。通常情況下coredmp包含了程序運行時的內存,寄存器狀態,堆棧指針,內存管理信息等,在設置妥當的情況下,該coredump文件在程序出錯時自動生成。

1.1.core操作

  • 阻止系統生成core文件:

ulimit -c 0

  • 檢查生成core文件的選項是否打開:

[john@localhost ~]$ ulimit -c
0
//-c 表示內核轉儲文件的大小限制,如果顯示爲零,則未打開。

  • 打開core轉儲文件:

ulimit -c 1073741824 #改爲1G
ulimit -c unlimited #改爲無限制

  • 使設置永久生效的辦法:

  上面所述方法,只是在當前shell中生效,重啓後丟失。永久生效的辦法是在profile中添加:

#vi /etc/profile
ulimit -c 1073741824 #注意,若將產生的轉儲文件大小大於該數字時,將不會產生轉儲文件

  • 指定內核轉儲的文件名和目錄
    缺省情況下,內核在coredump時所產生的core文件與該程序在相同的目錄中,並且文件名固定爲core。這時,如果有多個程序產生core文件,或者同一個程序多次崩潰,就會重複覆蓋同一個core文件。可以通過修改kernel的參數,指定內核轉儲所生成的core文件的路徑和文件名。在/etc/sysctl.conf中,設置sysctl變量kernel.core_pattern的值。

#vi /etc/sysctl.conf
kernel.core_pattern = /var/core/core_%e_%p #指定生成coredump文件的路徑和文件名
kernel.core_uses_pid = 0

  需要說明的是,如果/proc/sys/kernel/core_uses_pid的內容被配置成1,即使core_pattern中沒有設置%p,最後生成的core dump文件名仍會加上進程ID。其中,%e, %p分別表示:

%c 轉儲文件的大小上限
%e 所dump的文件名
%g 所dump的進程的實際組ID
%h 主機名
%p 所dump的進程PID
%s 導致本次coredump的信號
%t 轉儲時刻(由1970年1月1日起計的秒數)
%u 所dump進程的實際用戶ID

可以使用以下命令,使修改結果馬上生效。

sysctl –p /etc/sysctl.conf

  • 使用用戶程序自動壓縮轉儲文件
    爲了減輕磁盤的壓力通常對 coredump 進行壓縮,可在 kernel.core_pattern 中使用管道符來啓動用戶程序來實現這一點。

$ echo “|/usr/local/sbin/core_helper %t %e %p” > /proc/sys/kernel/core_pattern
其中 core_helper 即爲我們的用戶程序如下:

#!/bin/sh

exec gizp -> /var/core/$1-$2-$3.core.gz

1.2.使用core dump進行調試

#include <stdio.h>
int main(void)
{
    int *a = NULL;
    *a = 0x1;

    return 0;
}

  使用gcc -g a.c編譯生成可執行文件a.out,運行就會顯示:Segmentation fault(core dump),這表示在當前目錄下, 已經生成了a.out對應的內核轉儲文件。
使用以下方式啓動GDB調試:

#gdb -c ./*.core ./a.out
GNU gdb (GDB) 7.1-Ubuntu

Core was generated by ‘./a.out’.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004004dc in main() at a.c:6
6 *a =0x1;

a.c的第6行收到了11號信號。用GDB的list命令可以查看附近的源代碼。

2.Oops

  An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the console; it contains the processor status and the CPU registers of when the fault occurred. The offending process that triggered this Oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an Oops has occurred, the system cannot be trusted any further.

  在內核的關鍵點中,常常預設一些判斷條件來捕獲內核的異常,如果這些條件成立表明內核出現了bug或者warn。內核會調用panic復位或者僅僅打印一條warning消息。

2.1.BUG_ON()

 52 #ifndef HAVE_ARCH_BUG
 53 #define BUG() do { \
 54     printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
 55     barrier_before_unreachable(); \
 56     panic("BUG!"); \
 57 } while (0)
 58 #endif
 59 
 60 #ifndef HAVE_ARCH_BUG_ON
 61 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)                               
 62 #endif

What does BUG() macro do?

  • Prints the contents of the registers
  • Prints Stack Trace
  • Current Process dies

2.2.die()

  BUG_ON()捕獲的是一些預置異常,而die函數用來處理系統中的一些動態異常,比如非法地址訪問、非法指令、除0異常等等,這些異常會觸發cpu的異常處理程序,最終都會調用die()函數來處理。

arch/arm/kernel/traps.c:
347 void die(const char *str, struct pt_regs *regs, int err)
348 {
349     enum bug_trap_type bug_type = BUG_TRAP_TYPE_NONE;
350     unsigned long flags = oops_begin();
351     int sig = SIGSEGV;
352     
353     if (!user_mode(regs))
354         bug_type = report_bug(regs->ARM_pc, regs);
355     if (bug_type != BUG_TRAP_TYPE_NONE)
356         str = "Oops - BUG";
357     
358     if (__die(str, err, regs))                                                                           
359         sig = 0;
360         
361     oops_end(flags, regs, sig);
362 }   
363     

2.3.panic()

  BUG_ON()和die()的處理中,如果決定復位系統就會調用panic()。panic()函數做一些復位前的處理然後復位系統。

3.Example
在這裏插入圖片描述
Once executed, the module generates the following Oops:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
PGD 7a719067 PUD 7b2b3067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/misc/kvm/uevent
CPU 1
Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0)

Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b

Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP <ffff88007ad4bf08>
CR2: 0000000000000000

分析Oops dump:
1>.BUG: unable to handle kernel NULL pointer dereference at (null)

  • The first line indicates a pointer with a NULL value.

2>.IP: [] my_oops_init+0x12/0x21 [oops]

  • IP is the instruction pointer.
  • 告訴我們內核是執行到 my_oops_init+0x12/0x21 這個地址處出錯的,那麼我們需要做的就是找到這個地址對應的代碼格式爲函數+偏移/長度。my_oops_init指示了在first_drv_open中出現的異常:
    • 0x12表示出錯的偏移位置
    • 0x21 表示first_drv_open函數的大小

3>.PGD 7a719067 PUD 7b2b3067 PMD 0

  • 試圖訪問的地址(本例中爲0)的頁表的信息

4>.Oops: 0002 [#1] SMP

  • This is the error code value in hex. Each bit has a significance of its own:
bit 0 == 0 means no page found, 1 means a protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode
[#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a cascading effect of the first one.

5>.CPU 1

  • This denotes on which CPU the error occurred.

6>.Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64

  • The Tainted flag points to P here. Each flag has its own meaning. A few other flags, and their meanings, picked up from kernel/panic.c:
P — Proprietary module has been loaded.
F — Module has been forcibly loaded.
S — SMP with a CPU not designed for SMP.
R — User forced a module unload.
M — System experienced a machine check exception.
B — System has hit bad_page.
U — Userspace-defined naughtiness.
A — ACPI table overridden.
W — Taint on warning.

7>.RIP: 0010:[] [] my_oops_init+0x12/0x21 [oops]

  • RIP is the CPU register containing the address of the instruction that is getting executed (錯誤發生的地址).0010 comes from the code segment register. my_oops_init+0x12/0x21 is the + the offset/length.(錯誤發生的地址是my_oops_init函數的第0x12個字節,最後的0x21是my_oops_init函數的大小)

8>.RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010

  • This is a dump of the contents of some of the CPU registers.

9>.Stack:ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

  • The above is the stack trace.

10>.Call Trace:[] do_one_initcall+0x59/0x154[] sys_init_module+0xd1/0x230[] system_call_fastpath+0x16/0x1b

  • The above is the call trace — the list of functions being called just before the Oops occurred.

11>.Code: 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00

  • The Code is a hex-dump of the section of machine code that was being run at the time the Oops occurred.(錯誤發生時RIP指向的地址處的開頭20個字節的代碼)

3.1.Debugging an Oops dump

  • [root@DELL-RnD-India oops]# gdb oops.ko
    GNU gdb (GDB) Fedora (7.1-18.fc13)Reading symbols from /code/oops/oops.ko…done.(gdb) add-symbol-file oops.o 0xffffffffa03e1000add symbol table from file “oops.o” at .text_addr = 0xffffffffa03e1000

  • (gdb) disassemble my_oops_init
    Dump of assembler code for function my_oops_init: 0x0000000000000038 <+0>: push %rbp 0x0000000000000039 <+1>: mov $0x0,%rdi 0x0000000000000040 <+8>: xor %eax,%eax 0x0000000000000042 <+10>: mov %rsp,%rbp 0x0000000000000045 <+13>: callq 0x4a <my_oops_init+18> 0x000000000000004a <+18>: movl $0x0,0x0 0x0000000000000055 <+29>: xor %eax,%eax 0x0000000000000057 <+31>: leaveq 0x0000000000000058 <+32>: retqEnd of assembler dump.

  • Now, to pin point the actual line of offending code, we add the starting address and the offset. The offset is available in the same RIP instruction line. In our case, we are adding 0x0000000000000038 + 0x012 = 0x000000000000004a. This points to the movl instruction.
    (gdb) list *0x000000000000004a
    0x4a is in my_oops_init (/code/oops/oops.c:6).1 #include <linux/kernel.h>2 #include <linux/module.h>3 #include <linux/init.h>4 5 static void create_oops() {6 *(int *)0 = 0;7 }

refer to

  • https://gist.github.com/robbie-cao/32af7001443ac0b959abdf6ad6de2c9f
  • kernel/Documentation/oops-tracing.txt
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章