标签 Kernel 下的文章

Compressed 'head.S'

Why do we do this? Don’t ask me.. Incomprehensible are the ways of bootloaders.
                             — comments in arch/i386/boot/compressed/misc.c

There are two ‘head.S’ in linux source package. One is in $(Linux-2.6.15.3_dir/arch/i386/boot/compressed and the other one is in $(Linux-2.6.15.3_dir/arch/i386/kernel. The first one will be analyzed in this artical. Before we go ahead, let’s show a news of linux, that is ‘Army leans toward Linux for FCS(Future Combat System)’.

The first ‘head.S’ is also called ‘compressed head’, which used to decompress the kernel image. Different from those code before, we are now in 32-bit protected mode with paging disabled. The ‘compressed head’ starts from ‘startup_32′.

.text /* ! here just ‘.text’, without ‘.code16′ assembly directive */
.globl startup_32
 
startup_32:
 /*
  * ! clear direction flag
  * ! and clear interrupt flag
  */
 cld
 cli

 /*
  * ! all other segment registers are
  * ! reloaded after protected mode enabled
  * ! __BOOT_DS = 0×18
  */
 movl $(__BOOT_DS),%eax
 movl %eax, %ds
 movl %eax, %es
 movl %eax, %fs
 movl %eax, %gs

 /*
  * ! lss – load full pointer from memory
  * !       to register
  * ! and here ‘ss:esp = stack_start’
  */
 lss stack_start,%esp

 /*
  * ! EAX = 0;
  * ! do {
  * !     DS:[0] = ++EAX;
  * ! } while (DS:[0x100000] == EAX);
  */
 xorl %eax, %eax
1: incl %eax  # check that A20 really IS enabled
 movl %eax, 0×000000 # loop forever if it isn’t
 cmpl %eax, 0×100000
 je 1b

After reload the segment registers, the ‘compressed head’ clears the ‘eflags’ register and fills the kernel bss(the area of uninitialized data of the kernel identified by the _edata and _end symbols) with zeros. Then the decompressed process begins.

 /*
  * ! %esi has been loaded in ‘setup.S’ with ‘INITSET << 4′
  * ! ‘subl $16,%esp’ used to store the first arg, that is
  * ! struct moveparams {
  * !     uch *low_buffer_start;
  * !     int lcount;
  * !     uch *high_buffer_start;
  * !     int hcount;
  * ! } mv;
  * ! the second arg is the %esi which indicates the position
  * ! of the real-mode data
  */
 subl $16,%esp # place for structure on the stack
 movl %esp,%eax
 pushl %esi # real mode pointer as second arg
 pushl %eax # address of structure as first arg

 /*
  * ! if (!decompress_kernel(&mv, esi)) {         // return value in AX
  * !    restore esi from stack;
  * !    ebx = 0;
  * !    goto __BOOT_CS: $__PHYSICAL_START;
  * !    // see linux/arch/i386/kernel/head.S:startup_32
  * ! }
  * ! ‘decompress_kernel’ is coded in
  * ! $(linux-2.6.15.3_dir)/arch/i386/boot/compressed/misc.c
  *
/
 call decompress_kernel
 orl  %eax,%eax
 jnz  3f
 popl %esi # discard address
 popl %esi # real mode pointer
 xorl %ebx,%ebx
 ljmp $(__BOOT_CS), $__PHYSICAL_START

3:
 /*
  * ! move move_rountine_start..move_routine_end to 0×1000
  * ! both the two functions are defined in the tail of
  * ! this file
  */
 movl $move_routine_start,%esi
 movl $0×1000,%edi
 movl $move_routine_end,%ecx
 subl %esi,%ecx
 addl $3,%ecx
 shrl $2,%ecx
 cld
 rep
 movsl

 /*
  * ! Do preparation for ‘move_routine_start’:
  * ! set the parameters
  * ! ebx = real mode pointer
  * ! esi = mv.low_buffer_start
  * ! ecx = mv.lcount
  * ! edx = mv.high_buffer_start
  * ! eax = mv.hcount
  * ! edi = $__PHYSICAL_START
  */
 popl %esi # discard the address
 popl %ebx # real mode pointer
 popl %esi # low_buffer_start
 popl %ecx # lcount
 popl %edx # high_buffer_start
 popl %eax # hcount
 movl $__PHYSICAL_START,%edi
 cli  # make sure we don’t get interrupted

 /*
  * ! jump to physical address: __BOOT_CS:0×1000
  * ! where the move_routine_start function stays
  */
 ljmp $(__BOOT_CS), $0×1000 # and jump to the move routine

 /*
  * ! the control has been transfered to ‘move_routine_start’
  */
move_routine_start:
 movl %ecx,%ebp
 shrl $2,%ecx
 rep
 movsl
 movl %ebp,%ecx
 andl $3,%ecx
 rep
 movsb
 movl %edx,%esi
 movl %eax,%ecx # NOTE: rep movsb won’t move if %ecx == 0
 addl $3,%ecx
 shrl $2,%ecx
 rep
 movsl
 movl %ebx,%esi # Restore setup pointer
 xorl %ebx,%ebx
 ljmp $(__BOOT_CS), $__PHYSICAL_START
move_routine_end:

In ‘move_routine_start’, we perform the operations as follows:
(1) move mv.low_buffer_start to $__PHYSICAL_START, (mv.lcount >> 2) words;
(2) move/append (mv.lcount & 3) bytes;
(3) move/append mv.high_buffer_start, ((mv.hcount + 3) >> 2) words.

After move the decompressed kernel image to its right place, the control will be transfered to physical address:’$(__BOOT_CS):$__PHYSICAL_START’, where the second ‘head.S’ stays.

Transfer to '32-bit'

The phase we talked about before is in ‘Real-address Mode’, which runs 16-bit program modules. At the tail of "Begin ‘setup.S’", we had moved to ‘Protected Mode’, which usu runs 32-bit program modules. So there are two big problems which are ‘How to transfer control between 16-bit code and 32-bit code’ and how to transfer control from ‘real-mode’ to protected mode’. They are also what we wanna talk about in this artical.

The transfering codes are mainly in ‘setup.S and ‘head.S’. We have covered the ‘setup.S’ with a little detail about how to move to protected mode. Here we are going to make a supplementary.

First of all, let us have a look at the characteristics of 16-Bit and 32-Bit program modules, which quotes the ‘Intel Manual Vol3′.

Characteristic                         16-Bit Program Modules            32-Bit Program Modules
———————————————————————————————-
Segment Size                               0 to 64 KBytes                     0 to 4 GBytes
Operand Sizes                              8 bits and 16 bits              8 bits and 32 bits
Pointer Offset Size (Address Size)           16 bits                            32 bits
Stack Pointer Size                          16 Bits                              32 Bits
Control Transfers Allowed to Code           16 Bits                              32 Bits
Segments of This Size

The ‘Intel Manual Vol3′ also tells us how to distinguish between and support 16-bit and 32-bit segments and operations.

Details as follows:
(1) The D (default operand and address size) flag in code-segment descriptors.
(2) The B (default stack size) flag in stack-segment descriptors.
(3) 16-bit and 32-bit call gates, interrupt gates, and trap gates.
(4) Operand-size and address-size instruction prefixes.
(5) 16-bit and 32-bit general-purpose registers.

Due to the usage in ‘setup.S’, we are going to talk about item (4) in this artical and you can deep into the other four items by reading that bible book mentioned above. Before we say something about ‘instruction prefix’, we are going to do a review of ‘setup.S’. As we know, before switching to protected mode, a minimum set of system data structures and code modules must be loaded into memory. The GDT(Global Descriptor Table) is one of them. GDT consists of several 8-byte segment descriptors.

These segment descriptors describe the segment characteristics. They have several important fields. Some of the fields are listed below:
(1) ‘base’ – contains the linear address of the first byte of the segment.
(2) ‘G’ -  granularity flag, if it is cleared (equal to 0), the segment size is expressed in bytes; otherwise, it is expressed in multiples of 4096 bytes.
(3) ‘limit’ – holds the offset of the last memory cell in the segment, thus binding the segment length. When G is set to 0, the size of a segment may vary between 1 byte and 1 MB; otherwise, it may vary between 4 KB and 4 GB.

Here we are going to learn how the ‘setup.S’ define its provisional GDT, yeah, it is just a provisional GDT.

/*
 * ! $(linux-2.6.15.3_dir)/arch/i386/setup.S
 */
# Descriptor tables
#
# NOTE: The intel manual says gdt should be sixteen bytes aligned for
# efficiency reasons.  However, there are machines which are known not
# to boot with misaligned GDTs, so alter this at your peril!  If you alter
# GDT_ENTRY_BOOT_CS (in asm/segment.h) remember to leave at least two
# empty GDT entries (one for NULL and one reserved).
#
# NOTE: On some CPUs, the GDT must be 8 byte aligned.  This is
# true for the Voyager Quad CPU card which will not boot without
# This directive.  16 byte aligment is recommended by intel.
#
 .align 16
gdt:
 /*
  * ! #define GDT_ENTRY_BOOT_CS 2
  * ! The first segment descripter is setted by zero(Requested by Intel).
  * ! The second segment descripter is reserved and also setted by zero.
  * ! The third segment descripter:
  * !  base = 0; G flag = 4096(D) = 0×1000, limit = 0xFFFF * 0×1000 = 4Gb
  * ! The fourth segment descripter:
  * !  base = 0; G flag = 4096(D) = 0×1000, limit = 0xFFFF * 0×1000 = 4Gb
  */
 .fill GDT_ENTRY_BOOT_CS,8,0

 .word 0xFFFF    # 4Gb – (0×100000*0×1000 = 4Gb)
 .word 0    # base address = 0
 .word 0x9A00    # code read/exec
 .word 0x00CF    # granularity = 4096, 386
      #  (+5th nibble of limit)

 .word 0xFFFF    # 4Gb – (0×100000*0×1000 = 4Gb)
 .word 0    # base address = 0
 .word 0×9200    # data read/write
 .word 0x00CF    # granularity = 4096, 386
      #  (+5th nibble of limit)
gdt_end:
 .align 4
 
 .word 0    # alignment byte
idt_48:
 .word 0    # idt limit = 0
 .word 0, 0    # idt base = 0L

 .word 0    # alignment byte
gdt_48:
 /*
  * ! Segment descriptors are always 16 bytes long recommended by intel,
  * ! the GDT limit should always be one less than an integral
  * ! multiple of sixteen (that is, 16N – 1).
  * ! we can see that the gdt base will be reset later
  */
 .word gdt_end – gdt – 1  # gdt limit
 .word 0, 0    # gdt base (filled in later)

The following code performs an operation to load a liner address to GDTR(Global Descriptor Table Register). You must have to distinguish between GDTR(Global Descriptor Table Register) and GDT(Global Descriptor Table). The value stored in GDTR indicates where the GDT is. The GDTR is a key register when we moved to protected mode. so we must fill it before transferring control to protected mode. GDTR is 48-bit register, which consises of ‘limit’ field and ‘base’ field. We can use ‘lgdt m16/32′ instruction to fill this register. The ‘lgdt’ instruction loads a linear base address and limit value from a six-byte data operand in memory into the GDTR, respectively. If a 16-bit operand is used with ‘lgdt’, the register is loaded with a 16-bit limit and a 24-bit base, and the high-order eight bits of the six-byte data operand are not used. If a 32-bit operand is used, a 16-bit limit and a 32-bit base is loaded; the high-order eight bits of the six-byte operand are used as high-order base address bits. The following code showes us how the ‘setup.S’ loads the GDTR.
 
 # set up gdt and idt
 lidt idt_48    # load idt with 0,0
 xorl %eax, %eax   # Compute gdt_base
 movw %ds, %ax   # (Convert %ds:gdt to a linear ptr)
 shll $4, %eax
 addl $gdt, %eax

 /*
  * ! reset the GDT base to %ds:gdt, which is mentioned above
  * ! now %ds = SETUPSEG = 0×9020
  * ! after ‘lgdt’, the ‘base’ field value in GDTR is ((%ds << 4) + $gdt)
 movl %eax, (gdt_48+2)
 lgdt gdt_48    # load gdt with whatever is
                # appropriate

Thus, the preparation for
‘protected mode’ is over. What we want to do next is moving to the protected mode. We had mentioned that a far JMP instruction should be executed immediately after protected mode is enabled. Here ‘setup.S’ chooses a more simple way to transfer control to 32-bit protected mode. 

/*
 * ! $(linux-2.6.15.3_dir)/include/asm-i386/segment.h
 * Simple and small GDT entries for booting only
 */

#define GDT_ENTRY_BOOT_CS  2
#define __BOOT_CS (GDT_ENTRY_BOOT_CS * 8)

#define GDT_ENTRY_BOOT_DS  (GDT_ENTRY_BOOT_CS + 1)
#define __BOOT_DS (GDT_ENTRY_BOOT_DS * 8)

/*
 * $(linux-2.6.15.3_dir)/arch/i386/setup.S
 */
#
# jump to startup_32 in arch/i386/boot/compressed/head.S

# NOTE: For high loaded big kernels we need a
# jmpi    0×100000,__BOOT_CS
#
 .byte 0×66, 0xea   # prefix + jmpi-opcode
code32: .long 0×1000    # will be set to 0×100000
      # for big kernels
 .word __BOOT_CS

There is a hard-coding instruction to do the jump. ’0xea’ is the binary coding form of ‘jmpi’ instruction. the ‘jmpi’ instruction uses a four-byte(when operand’s size is 16 bits) or six-byte(when operand’s size is 32 bits) operand as a long pointer to the destination. Now we are in 16-bit mode, all the operand’s size is 16 bits(mainly the target offset). But we want to jump to a 32-bit program module where instructions are executed in 32-bit mode. How can we deal with it, since we can not directly jump there. The solution is to add ’0×66′ instruction prefix before ‘jmpi’. This instruction prefix reverse the default size selected by the D flag in the code-segment descriptor and guarantees that the CPU will properly take our 48 bit far pointer(it is also called ‘logical address’ in protected mode and it consists of 16-bit segment selector and 32-bit offset). the ‘jmpi’ loads ‘__BOOT_CS’ to %cs and treats the 0×100000(big kernel) as an offset.

Where are we arrived after the intersegmental jump? Which instruction is the CPU going to execute? Both of these are what we want to solve. Now we are in protected mode with paging disabled and the memory addressing model mode has been changed. It is the ‘segmented memory model’ in protected mode. In this model, to address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. Internally, the processor translates each logical address into a linear address to access a memory location. the segment selector decides which segment descriptor to be used in GDT and the final liner address could be caculated by such a formula ‘segment_descriptor.base + offset’.

There is a logical address available in ‘setup.S’, that is ‘__BOOT_CS(0000000000010000B) : 0×0010000′. The first high-order 13 bits decide the index(based on zero) of the segment descriptor to use in GDT. Here the index is equal to ’2′. Just review the code above, the segment descriptor is the third defined in lable ‘gdt’ and its base is 0. Now we can make a conclusion that the first instruction’s liner address is ’0 + 0×00100000′, that is 0×001000000. It is just the location where ‘head.S’(the first part of the system) stays.

/*
 * ! $(linux-2.6.15.3_dir)/arch/i386/boot/compressed/head.S
 */
 .globl startup_32
 
startup_32:
 cld
 cli
 movl $(__BOOT_DS),%eax
 movl %eax,%ds
 movl %eax,%es
 movl %eax,%fs
 movl %eax,%gs

Here there is still a question, that is why we do not use ‘jmpi startup_32, __BOOT_CS’ instead of ‘jmpi 0×100000, __BOOT_CS’? We know that linux finally makes paging enable and build its own virtual memory management system. At that time, the linux kernel will have 4G-byte virtual address space and it only runs over the high 1G-byte(from 0xC0000000 to 0xFFFFFFFF) space. But the physical address space always starts from 0×00000000. There is a offset between kernel’s virtual address space and the physical address space. The offset is just ’0xC0000000′. So when we build linux kernel image, all address of labels in protected mode and later phases are added the offset. The address of label startup_32 is 0xC0100000. It won’t be used unless the paging is enabled. The code in this ‘head.S’ is also to do preparation for paging.

如发现本站页面被黑,比如:挂载广告、挖矿等恶意代码,请朋友们及时联系我。十分感谢! Go语言第一课 Go语言精进之路1 Go语言精进之路2 Go语言编程指南
商务合作请联系bigwhite.cn AT aliyun.com

欢迎使用邮件订阅我的博客

输入邮箱订阅本站,只要有新文章发布,就会第一时间发送邮件通知你哦!

这里是 Tony Bai的个人Blog,欢迎访问、订阅和留言! 订阅Feed请点击上面图片

如果您觉得这里的文章对您有帮助,请扫描上方二维码进行捐赠 ,加油后的Tony Bai将会为您呈现更多精彩的文章,谢谢!

如果您希望通过微信捐赠,请用微信客户端扫描下方赞赏码:

如果您希望通过比特币或以太币捐赠,可以扫描下方二维码:

比特币:

以太币:

如果您喜欢通过微信浏览本站内容,可以扫描下方二维码,订阅本站官方微信订阅号“iamtonybai”;点击二维码,可直达本人官方微博主页^_^:
本站Powered by Digital Ocean VPS。
选择Digital Ocean VPS主机,即可获得10美元现金充值,可 免费使用两个月哟! 著名主机提供商Linode 10$优惠码:linode10,在 这里注册即可免费获 得。阿里云推荐码: 1WFZ0V立享9折!


View Tony Bai's profile on LinkedIn
DigitalOcean Referral Badge

文章

评论

  • 正在加载...

分类

标签

归档



View My Stats