《Linux内核修炼之道》精华分享与讨论(15)——子系统的初始化

首先感谢国家。其次感谢上大的钟莉颖,让我知道了大学不仅有校花,还有校鸡,而且很多时候这两者其实没什么差别。最后感谢清华女刘静,让我深刻体会到了素质教育的重要性,让我感到有责任写写子系统的初始化。

各个子系统的初始化是内核整个初始化过程必然要完成的基本任务,这些任务按照固定的模式来处理,可以归纳为两个部分:内核选项的解析以及那些子系统入口(初始化)函数的调用。

内核选项

Linux允许用户传递内核配置选项给内核,内核在初始化过程中调用parse_args函数对这些选项进行解析,并调用相应的处理函数。

parse_args函数能够解析形如 变量名=值 的字符串,在模块加载时,它也会被调用来解析模块参数。

内核选项的使用格式同样为 变量名=值 ,打开系统的grub文件,然后找到kernel行,比如:

kernel/boot/vmlinuz-2.6.18 root=/dev/sda1 ro splash=silent vga=0x314 pci=noacpi

其中的 pci=noacpi 等都表示内核选项。

内核选项不同于模块参数,模块参数通常在模块加载时通过 变量名=值 的形式指定,而不是内核启动时。如果希望在内核启动时使用模块参数,则必须添加模块名做为前缀,使用 模块名.参数=值 的形式,比如,使用下面的命令在加载usbcore时指定模块参数autosuspend的值为2。

$ modprobe usbcore autosuspend=2

若是在内核启动时指定,则必须使用下面的形式:

usbcore.autosuspend=2

从Documentation/kernel-parameters.txt文件里可以查询到某个子系统已经注册的内核选项,比如PCI子系统注册的内核选项为:pci=option[,option…][PCI] various PCI subsystem options: off[X86-32] don’t probe for the PCI bus bios[X86-32] force use of PCI BIOS, don’t access the hardware directly. Use this if your machine has a non-standard PCI host bridge. nobios[X86-32] disallow use of PCI BIOS, only direct hardware access methods are allowed. Use this if you experience crashes upon bootup and you suspect they are caused by the BIOS. conf1[X86-32] Force use of PCI Configuration Mechanism 1. conf2[X86-32] Force use of PCI Configuration Mechanism 2. nommconf[X86-32,X86_64] Disable use of MMCONFIG for PCI Configuration nomsi[MSI] If the PCI_MSI kernel config parameter is enabled, this kernel boot option can be used to disable the use of MSI interrupts system-wide. nosort[X86-32] Don’t sort PCI devices according to order given by the PCI BIOS. This sorting is done to get a device order compatible with older kernels. biosirq[X86-32] Use PCI BIOS calls to get the interrupt routing table. These calls are known to be buggy on several machines and they hang the machine when used, but on other computers it’s the only way to get the interrupt routing table. Try this option if the kernel is unable to allocate IRQs or discover secondary PCI buses on your motherboard. rom[X86-32] Assign address space to expansion ROMs. Use with caution as certain devices share address decoders between ROMs and other resources. irqmask=0xMMMM[X86-32] Set a bit mask of IRQs allowed to be assigned automatically to PCI devices. You can make the kernel exclude IRQs of your ISA cards this way. pirqaddr=0xAAAAA[X86-32] Specify the physical address of the PIRQ table (normally generated by the BIOS) if it is outside the F0000h-100000h range. lastbus=N[X86-32] Scan all buses thru bus #N. Can be useful if the kernel is unable to find your secondary buses and you want to tell it explicitly which ones they are. assign-busses[X86-32] Always assign all PCI bus numbers ourselves, overriding whatever the firmware may have done. usepirqmask[X86-32] Honor the possible IRQ mask stored in the BIOS $PIR table. This is needed on some systems with broken BIOSes, notably some HP Pavilion N5400 and Omnibook XE3 notebooks. This will have no effect if ACPI IRQ routing is enabled. noacpi[X86-32] Do not use ACPI for IRQ routing or for PCI scanning. routeirqDo IRQ routing for all PCI devices. This is normally done in pci_enable_device(), so this option is a temporary workaround for broken drivers that don’t call it. firmware[ARM] Do not re-enumerate the bus but instead just use the configuration from the bootloader. This is currently used on IXP2000 systems where the bus has to be configured a certain way for adjunct CPUs. noearly[X86] Don’t do any early type 1 scanning. This might help on some broken boards whichmachine check when some devices’ config space is read. But various workarounds are disabled and some IOMMU drivers will not work. bfsortSort PCI devices into breadth-first order. This sorting is done to get a device order compatible with older ( = 2.4) kernels. nobfsortDon’t sort PCI devices into breadth-first order. cbiosize=nn[KMG]The fixed amount of bus space which is reserved for the CardBus bridge’s IO window. The default value is 256 bytes. cbmemsize=nn[KMG]The fixed amount of bus space which is reserved for the CardBus bridge’s memory window. The default value is 64 megabytes.

注册内核选项

就像我们不需要明白钟莉颖是如何走上校鸡的修炼之道,我们也不必理解parse_args函数的实现细节。但我们必须知道如何注册内核选项:模块参数使用module_param系列的宏注册,内核选项则使用__setup宏来注册。

__setup宏在include/linux/init.h文件中定义。

171 #define __setup(str, fn)\172 __setup_param(str, fn, fn, 0)

__setup需要两个参数,其中str是内核选项的名字,fn是该内核选项关联的处理函数。__setup宏告诉内核,在启动时如果检测到内核选项str,则执行函数fn。str除了包括内核选项名字之外,必须以 = 字符结束。

不同的内核选项可以关联相同的处理函数,比如内核选项netdev和ether都关联了netdev_boot_setup函数。

除了__setup宏之外,还可以使用early_param宏注册内核选项。它们的使用方式相同,不同的是,early_param宏注册的内核选项必须要在其他内核选项之前被处理。

两次解析

相应于__setup宏和early_param宏两种注册形式,内核在初始化时,调用了两次parse_args函数进行解析。

parse_early_param();parse_args( Booting kernel , static_command_line, __start___param,__stop___param – __start___param, unknown_bootoption);

parse_args的第一次调用就在parse_early_param函数里面,为什么会出现两次调用parse_args的情况?这是因为内核选项又分成了两种,就像现实世界中的我们,一种是普普通通的,一种是有特权的,有特权的需要在普通选项之前进行处理。

现实生活中特权的定义好像很模糊,不同的人有不同的诠释,比如哈医大二院的纪委书记在接受央视的采访 老人住院费550万元 时如是说: 我们就是一所人民医院 就是一所贫下中农的医院,从来不用特权去索取自己身外的任何利益 我们不但没有多收钱还少收了。 人生就是如此的复杂和奇怪。内核选项相对来说就要单纯得多,特权都是阳光下的,不会藏着掖着,直接使用early_param宏去声明,让你一眼就看出它是有特权的。使用early_param声明的那些选项就会首先由parse_early_param去解析。

与那些新人和旧人们共同经历吧!

《Linux内核修炼之道》精华分享与讨论(15)——子系统的初始化

相关文章:

你感兴趣的文章:

标签云: