模擬一次soft lockup事件

前面我已經介紹過 Linux內核故障分類和排查 這篇文章。

通過前文,可以知道內核故障中有一類,叫做lockup,實際上就是死鎖,分爲soft lockup和hard lockup,對於hard lockup可能還需要平臺的支持,那麼本文就來模擬觸發一下lockup的場景。在開始之前,需要做一下準備工作,編譯內核,一定要打開如下的選項:

CONFIG_LOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1

廢話不多說,直接上代碼:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/spinlock.h>
#include <asm-generic/delay.h>
#include <linux/kthread.h>


static int soft_lockup = 1;
module_param(soft_lockup, int, 0644);
MODULE_PARM_DESC(soft_lockup, "soft_lockup");

static struct task_struct *taskid;
static spinlock_t slock;

static int kthread_lockup(void *data)
{
    spinlock_t *lock;
    lock = (spinlock_t *)data;
    while(!kthread_should_stop()) {
        udelay(19000);
        if (soft_lockup) {
            spin_lock(lock);
            spin_lock(lock);
        } else {
            spin_lock_irq(lock); //disable local irq
            spin_lock_irq(lock);
        }
    }
    return 0;
}

static int __init lockup_init(void)
{
    int err;
    spin_lock_init(&slock);
    taskid = kthread_run(kthread_lockup, &slock, "lockup_test");
    if (IS_ERR(taskid)) {
		err = PTR_ERR(taskid);
		return err;
	}
    return 0;
}

static void __init lockup_exit(void)
{
    kthread_stop(taskid);
}

module_init(lockup_init);
module_exit(lockup_exit);
MODULE_AUTHOR("Haocheng Xie");
MODULE_LICENSE("GPL v2");

這是一個簡短的內核測試模塊,中間我們會獲取兩次spinlock來觸發死鎖行爲,第一次spin_lock會關閉本地CPU上的搶佔功能,這樣在該CPU上就無法產生調度行爲。第二次spin_lock就會一直在該CPU上忙等待佔用CPU,這樣就導致了該CPU一直處於忙等待狀態,雖然可以繼續處理中斷,但是進程卻無法切換調度,所以watchdog內核線程也就無法定期喂狗,最後觸發panic。

當soft lockup時間超時之後,直接觸發panic行爲:

/ # NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [lockup_test:646]
Modules linked in: lockup(O)

CPU: 1 PID: 646 Comm: lockup_test Tainted: G           O    4.0.0 #1
Hardware name: linux,dummy-virt (DT)
task: ffff800078659600 ti: ffff8000787d4000 task.ti: ffff8000787d4000
PC is at _raw_spin_lock+0x38/0x48
LR is at kthread_lockup+0x48/0x80 [lockup]
pc : [<ffff800000562ba0>] lr : [<ffff7ffffc000048>] pstate: 20000145
sp : ffff8000787d7e10
x29: ffff8000787d7e10 x28: 0000000000000000
x27: 0000000000000000 x26: 0000000000000000
x25: 0000000000000000 x24: 0000000000000000
x23: ffff7ffffc000000 x22: ffff7ffffc0002d8
x21: ffff8000006bb4a8 x20: ffff7ffffc0000e4
x19: ffff7ffffc0002d8 x18: 0000000000000004
x17: 0000000000000190 x16: 0000000000100100
x15: 0000000000200200 x14: 00000000000c8000
x13: 0000000000001e12 x12: 0000000000001e12
x11: ffffffffffffffff x10: 00000000000003ff
x9 : ffff8000787d7db0 x8 : ffff800078659b70
x7 : 00000004d70b0640 x6 : 000000007e83e000
x5 : 0000000000000000 x4 : 0000000000800000
x3 : 0000000000000000 x2 : 0000000000000001
x1 : 0000000000010000 x0 : ffff7ffffc0002d8

Kernel panic - not syncing: softlockup: hung tasks
CPU: 1 PID: 646 Comm: lockup_test Tainted: G           O L  4.0.0 #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff8000000898fc>] dump_backtrace+0x0/0x11c
[<ffff800000089a28>] show_stack+0x10/0x1c
[<ffff80000055ea74>] dump_stack+0x84/0xc4
[<ffff80000055db50>] panic+0xe0/0x220
[<ffff800000125578>] watchdog_timer_fn+0x224/0x228
[<ffff8000000fca48>] __run_hrtimer.isra.34+0x48/0x108
[<ffff8000000fcd48>] hrtimer_interrupt+0xc8/0x244
[<ffff8000004764ec>] arch_timer_handler_virt+0x28/0x38
[<ffff8000000f0564>] handle_percpu_devid_irq+0x78/0xa0
[<ffff8000000ec614>] generic_handle_irq+0x30/0x4c
[<ffff8000000ec928>] __handle_domain_irq+0x58/0xb0
[<ffff800000082424>] gic_handle_irq+0x30/0x80

可以看到上面的打印:

BUG: soft lockup - CPU#1 stuck for 23s! [lockup_test:646]

說明這是一個soft lockup的bug導致的內核崩潰。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章