《Linux內核修煉之道》精華分享與討論(15)——子系統的初始化:內核選項解析

 首先感謝國家。其次感謝上大的鐘莉穎,讓我知道了大學不僅有校花,還有校雞,而且很多時候這兩者其實沒什麼差別。最後感謝清華女劉靜,讓我深刻體會到了素質教育的重要性,讓我感到有責任寫寫子系統的初始化。

各個子系統的初始化是內核整個初始化過程必然要完成的基本任務,這些任務按照固定的模式來處理,可以歸納爲兩個部分:內核選項的解析以及那些子系統入口(初始化)函數的調用。

內核選項

Linux允許用戶傳遞內核配置選項給內核,內核在初始化過程中調用parse_args函數對這些選項進行解析,並調用相應的處理函數。

parse_args函數能夠解析形如“變量名=值”的字符串,在模塊加載時,它也會被調用來解析模塊參數。

內核選項的使用格式同樣爲“變量名=值”,打開系統的grub文件,然後找到kernel行,比如:

    kernel  /boot/vmlinuz-2.6.18 root=/dev/sda1 ro splash=silent vga=0x314 pci=noacpi

其中的“pci=noacpi”等都表示內核選項。

內核選項不同於模塊參數,模塊參數通常在模塊加載時通過“變量名=值”的形式指定,而不是內核啓動時。如果希望在內核啓動時使用模塊參數,則必須添加模塊名做爲前綴,使用“模塊名.參數=值”的形式,比如,使用下面的命令在加載usbcore時指定模塊參數autosuspend的值爲2。

    $ modprobe usbcore autosuspend=2

若是在內核啓動時指定,則必須使用下面的形式:

    usbcore.autosuspend=2

從Documentation/kernel-parameters.txt文件裏可以查詢到某個子系統已經註冊的內核選項,比如PCI子系統註冊的內核選項爲:
 pci=option[,option...] [PCI] various PCI subsystem options:
    off  [X86-32] don't probe for the PCI bus
    bios  [X86-32] force use of PCI BIOS, don't access
        the hardware directly. Use this if your machine
        has a non-standard PCI host bridge.
    nobios  [X86-32] disallow use of PCI BIOS, only direct
        hardware access methods are allowed. Use this
        if you experience crashes upon bootup and you
        suspect they are caused by the BIOS.
    conf1  [X86-32] Force use of PCI Configuration
        Mechanism 1.
    conf2  [X86-32] Force use of PCI Configuration
        Mechanism 2.
    nommconf [X86-32,X86_64] Disable use of MMCONFIG for PCI
        Configuration
    nomsi  [MSI] If the PCI_MSI kernel config parameter is
        enabled, this kernel boot option can be used to
        disable the use of MSI interrupts system-wide.
    nosort  [X86-32] Don't sort PCI devices according to
        order given by the PCI BIOS. This sorting is
        done to get a device order compatible with
        older kernels.
    biosirq  [X86-32] Use PCI BIOS calls to get the interrupt
        routing table. These calls are known to be buggy
        on several machines and they hang the machine
        when used, but on other computers it's the only
        way to get the interrupt routing table. Try
        this option if the kernel is unable to allocate
        IRQs or discover secondary PCI buses on your
        motherboard.
    rom  [X86-32] Assign address space to expansion ROMs.
        Use with caution as certain devices share
        address decoders between ROMs and other
        resources.
    irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
        assigned automatically to PCI devices. You can
        make the kernel exclude IRQs of your ISA cards
        this way.
    pirqaddr=0xAAAAA [X86-32] Specify the physical address
        of the PIRQ table (normally generated
        by the BIOS) if it is outside the
        F0000h-100000h range.
    lastbus=N [X86-32] Scan all buses thru bus #N. Can be
        useful if the kernel is unable to find your
        secondary buses and you want to tell it
        explicitly which ones they are.
    assign-busses [X86-32] Always assign all PCI bus
        numbers ourselves, overriding
        whatever the firmware may have done.
    usepirqmask [X86-32] Honor the possible IRQ mask stored
        in the BIOS $PIR table. This is needed on
        some systems with broken BIOSes, notably
        some HP Pavilion N5400 and Omnibook XE3
        notebooks. This will have no effect if ACPI
        IRQ routing is enabled.
    noacpi  [X86-32] Do not use ACPI for IRQ routing
        or for PCI scanning.
    routeirq Do IRQ routing for all PCI devices.
        This is normally done in pci_enable_device(),
        so this option is a temporary workaround
        for broken drivers that don't call it.
    firmware [ARM] Do not re-enumerate the bus but instead
        just use the configuration from the
        bootloader. This is currently used on
        IXP2000 systems where the bus has to be
        configured a certain way for adjunct CPUs.
    noearly  [X86] Don't do any early type 1 scanning.
        This might help on some broken boards which
        machine check when some devices' config space
        is read. But various workarounds are disabled
        and some IOMMU drivers will not work.
    bfsort  Sort PCI devices into breadth-first order.
        This sorting is done to get a device
        order compatible with older (<= 2.4) kernels.
    nobfsort Don't sort PCI devices into breadth-first order.
    cbiosize=nn[KMG] The fixed amount of bus space which is
        reserved for the CardBus bridge's IO window.
        The default value is 256 bytes.
    cbmemsize=nn[KMG] The fixed amount of bus space which is
        reserved for the CardBus bridge's memory
        window. The default value is 64 megabytes.

註冊內核選項

就像我們不需要明白鍾莉穎是如何走上校雞的修煉之道,我們也不必理解parse_args函數的實現細節。但我們必須知道如何註冊內核選項:模塊參數使用module_param系列的宏註冊,內核選項則使用__setup宏來註冊。

__setup宏在include/linux/init.h文件中定義。

171 #define __setup(str, fn)     \
172   __setup_param(str, fn, fn, 0)

__setup需要兩個參數,其中str是內核選項的名字,fn是該內核選項關聯的處理函數。__setup宏告訴內核,在啓動時如果檢測到內核選項str,則執行函數fn。str除了包括內核選項名字之外,必須以“=”字符結束。

不同的內核選項可以關聯相同的處理函數,比如內核選項netdev和ether都關聯了netdev_boot_setup函數。

除了__setup宏之外,還可以使用early_param宏註冊內核選項。它們的使用方式相同,不同的是,early_param宏註冊的內核選項必須要在其他內核選項之前被處理。

兩次解析

相應於__setup宏和early_param宏兩種註冊形式,內核在初始化時,調用了兩次parse_args函數進行解析。

parse_early_param();
parse_args("Booting kernel", static_command_line, __start___param,
    __stop___param - __start___param,
           &unknown_bootoption);

parse_args的第一次調用就在parse_early_param函數裏面,爲什麼會出現兩次調用parse_args的情況?這是因爲內核選項又分成了兩種,就像現實世界中的我們,一種是普普通通的,一種是有特權的,有特權的需要在普通選項之前進行處理。

現實生活中特權的定義好像很模糊,不同的人有不同的詮釋,比如哈醫大二院的紀委書記在接受央視的採訪“老人住院費550萬元”時如是說:“我們就是一所人民醫院……就是一所貧下中農的醫院,從來不用特權去索取自己身外的任何利益……我們不但沒有多收錢還少收了。”
人生就是如此的複雜和奇怪。內核選項相對來說就要單純得多,特權都是陽光下的,不會藏着掖着,直接使用early_param宏去聲明,讓你一眼就看出它是有特權的。使用early_param聲明的那些選項就會首先由parse_early_param去解析。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章