Assembly - Tony Bai

Transfer to '32-bit'

二月 17, 2006

The phase we talked about before is in ‘Real-address Mode’, which runs 16-bit program modules. At the tail of "Begin ‘setup.S’", we had moved to ‘Protected Mode’, which usu runs 32-bit program modules. So there are two big problems which are ‘How to transfer control between 16-bit code and 32-bit code’ and how to transfer control from ‘real-mode’ to protected mode’. They are also what we wanna talk about in this artical.

The transfering codes are mainly in ‘setup.S and ‘head.S’. We have covered the ‘setup.S’ with a little detail about how to move to protected mode. Here we are going to make a supplementary.

First of all, let us have a look at the characteristics of 16-Bit and 32-Bit program modules, which quotes the ‘Intel Manual Vol3′.

Characteristic                         16-Bit Program Modules            32-Bit Program Modules
———————————————————————————————-
Segment Size                               0 to 64 KBytes                     0 to 4 GBytes
Operand Sizes                              8 bits and 16 bits              8 bits and 32 bits
Pointer Offset Size (Address Size)           16 bits                            32 bits
Stack Pointer Size                          16 Bits                             32 Bits
Control Transfers Allowed to Code           16 Bits                              32 Bits
Segments of This Size

The ‘Intel Manual Vol3′ also tells us how to distinguish between and support 16-bit and 32-bit segments and operations.

Details as follows:
(1) The D (default operand and address size) flag in code-segment descriptors.
(2) The B (default stack size) flag in stack-segment descriptors.
(3) 16-bit and 32-bit call gates, interrupt gates, and trap gates.
(4) Operand-size and address-size instruction prefixes.
(5) 16-bit and 32-bit general-purpose registers.

Due to the usage in ‘setup.S’, we are going to talk about item (4) in this artical and you can deep into the other four items by reading that bible book mentioned above. Before we say something about ‘instruction prefix’, we are going to do a review of ‘setup.S’. As we know, before switching to protected mode, a minimum set of system data structures and code modules must be loaded into memory. The GDT(Global Descriptor Table) is one of them. GDT consists of several 8-byte segment descriptors.

These segment descriptors describe the segment characteristics. They have several important fields. Some of the fields are listed below:
(1) ‘base’ – contains the linear address of the first byte of the segment.
(2) ‘G’ - granularity flag, if it is cleared (equal to 0), the segment size is expressed in bytes; otherwise, it is expressed in multiples of 4096 bytes.
(3) ‘limit’ – holds the offset of the last memory cell in the segment, thus binding the segment length. When G is set to 0, the size of a segment may vary between 1 byte and 1 MB; otherwise, it may vary between 4 KB and 4 GB.

Here we are going to learn how the ‘setup.S’ define its provisional GDT, yeah, it is just a provisional GDT.

/*
* ! $(linux-2.6.15.3_dir)/arch/i386/setup.S
*/
# Descriptor tables
#
# NOTE: The intel manual says gdt should be sixteen bytes aligned for
# efficiency reasons. However, there are machines which are known not
# to boot with misaligned GDTs, so alter this at your peril! If you alter
# GDT_ENTRY_BOOT_CS (in asm/segment.h) remember to leave at least two
# empty GDT entries (one for NULL and one reserved).
#
# NOTE: On some CPUs, the GDT must be 8 byte aligned. This is
# true for the Voyager Quad CPU card which will not boot without
# This directive. 16 byte aligment is recommended by intel.
#
.align 16
gdt:
/*
* ! #define GDT_ENTRY_BOOT_CS 2
* ! The first segment descripter is setted by zero(Requested by Intel).
* ! The second segment descripter is reserved and also setted by zero.
* ! The third segment descripter:
* ! base = 0; G flag = 4096(D) = 0×1000, limit = 0xFFFF * 0×1000 = 4Gb
* ! The fourth segment descripter:
* ! base = 0; G flag = 4096(D) = 0×1000, limit = 0xFFFF * 0×1000 = 4Gb
*/
.fill GDT_ENTRY_BOOT_CS,8,0

.word 0xFFFF    # 4Gb – (0×100000*0×1000 = 4Gb)
.word 0    # base address = 0
.word 0x9A00    # code read/exec
.word 0x00CF    # granularity = 4096, 386
      # (+5th nibble of limit)

.word 0xFFFF    # 4Gb – (0×100000*0×1000 = 4Gb)
.word 0    # base address = 0
.word 0×9200    # data read/write
.word 0x00CF    # granularity = 4096, 386
      # (+5th nibble of limit)
gdt_end:
.align 4

.word 0    # alignment byte
idt_48:
.word 0    # idt limit = 0
.word 0, 0    # idt base = 0L

.word 0    # alignment byte
gdt_48:
/*
* ! Segment descriptors are always 16 bytes long recommended by intel,
* ! the GDT limit should always be one less than an integral
* ! multiple of sixteen (that is, 16N – 1).
* ! we can see that the gdt base will be reset later
*/
.word gdt_end – gdt – 1  # gdt limit
.word 0, 0    # gdt base (filled in later)

The following code performs an operation to load a liner address to GDTR(Global Descriptor Table Register). You must have to distinguish between GDTR(Global Descriptor Table Register) and GDT(Global Descriptor Table). The value stored in GDTR indicates where the GDT is. The GDTR is a key register when we moved to protected mode. so we must fill it before transferring control to protected mode. GDTR is 48-bit register, which consises of ‘limit’ field and ‘base’ field. We can use ‘lgdt m16/32′ instruction to fill this register. The ‘lgdt’ instruction loads a linear base address and limit value from a six-byte data operand in memory into the GDTR, respectively. If a 16-bit operand is used with ‘lgdt’, the register is loaded with a 16-bit limit and a 24-bit base, and the high-order eight bits of the six-byte data operand are not used. If a 32-bit operand is used, a 16-bit limit and a 32-bit base is loaded; the high-order eight bits of the six-byte operand are used as high-order base address bits. The following code showes us how the ‘setup.S’ loads the GDTR.

# set up gdt and idt
lidt idt_48    # load idt with 0,0
xorl %eax, %eax   # Compute gdt_base
movw %ds, %ax   # (Convert %ds:gdt to a linear ptr)
shll $4, %eax
addl $gdt, %eax

/*
* ! reset the GDT base to %ds:gdt, which is mentioned above
* ! now %ds = SETUPSEG = 0×9020
* ! after ‘lgdt’, the ‘base’ field value in GDTR is ((%ds << 4) + $gdt)
movl %eax, (gdt_48+2)
lgdt gdt_48 # load gdt with whatever is
# appropriate

Thus, the preparation for
‘protected mode’ is over. What we want to do next is moving to the protected mode. We had mentioned that a far JMP instruction should be executed immediately after protected mode is enabled. Here ‘setup.S’ chooses a more simple way to transfer control to 32-bit protected mode.

/*
* ! $(linux-2.6.15.3_dir)/include/asm-i386/segment.h
* Simple and small GDT entries for booting only
*/

#define GDT_ENTRY_BOOT_CS 2
#define __BOOT_CS (GDT_ENTRY_BOOT_CS *

#define GDT_ENTRY_BOOT_DS (GDT_ENTRY_BOOT_CS + 1)
#define __BOOT_DS (GDT_ENTRY_BOOT_DS *

/*
* $(linux-2.6.15.3_dir)/arch/i386/setup.S
*/
#
# jump to startup_32 in arch/i386/boot/compressed/head.S
#
# NOTE: For high loaded big kernels we need a
# jmpi    0×100000,__BOOT_CS
#
.byte 0×66, 0xea   # prefix + jmpi-opcode
code32: .long 0×1000    # will be set to 0×100000
      # for big kernels
.word __BOOT_CS

There is a hard-coding instruction to do the jump. ’0xea’ is the binary coding form of ‘jmpi’ instruction. the ‘jmpi’ instruction uses a four-byte(when operand’s size is 16 bits) or six-byte(when operand’s size is 32 bits) operand as a long pointer to the destination. Now we are in 16-bit mode, all the operand’s size is 16 bits(mainly the target offset). But we want to jump to a 32-bit program module where instructions are executed in 32-bit mode. How can we deal with it, since we can not directly jump there. The solution is to add ’0×66′ instruction prefix before ‘jmpi’. This instruction prefix reverse the default size selected by the D flag in the code-segment descriptor and guarantees that the CPU will properly take our 48 bit far pointer(it is also called ‘logical address’ in protected mode and it consists of 16-bit segment selector and 32-bit offset). the ‘jmpi’ loads ‘__BOOT_CS’ to %cs and treats the 0×100000(big kernel) as an offset.

Where are we arrived after the intersegmental jump? Which instruction is the CPU going to execute? Both of these are what we want to solve. Now we are in protected mode with paging disabled and the memory addressing model mode has been changed. It is the ‘segmented memory model’ in protected mode. In this model, to address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. Internally, the processor translates each logical address into a linear address to access a memory location. the segment selector decides which segment descriptor to be used in GDT and the final liner address could be caculated by such a formula ‘segment_descriptor.base + offset’.

There is a logical address available in ‘setup.S’, that is ‘__BOOT_CS(0000000000010000B) : 0×0010000′. The first high-order 13 bits decide the index(based on zero) of the segment descriptor to use in GDT. Here the index is equal to ’2′. Just review the code above, the segment descriptor is the third defined in lable ‘gdt’ and its base is 0. Now we can make a conclusion that the first instruction’s liner address is ’0 + 0×00100000′, that is 0×001000000. It is just the location where ‘head.S’(the first part of the system) stays.

/*
* ! $(linux-2.6.15.3_dir)/arch/i386/boot/compressed/head.S
*/
.globl startup_32

startup_32:
cld
cli
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs

Here there is still a question, that is why we do not use ‘jmpi startup_32, __BOOT_CS’ instead of ‘jmpi 0×100000, __BOOT_CS’? We know that linux finally makes paging enable and build its own virtual memory management system. At that time, the linux kernel will have 4G-byte virtual address space and it only runs over the high 1G-byte(from 0xC0000000 to 0xFFFFFFFF) space. But the physical address space always starts from 0×00000000. There is a offset between kernel’s virtual address space and the physical address space. The offset is just ’0xC0000000′. So when we build linux kernel image, all address of labels in protected mode and later phases are added the offset. The address of label startup_32 is 0xC0100000. It won’t be used unless the paging is enabled. The code in this ‘head.S’ is also to do preparation for paging.

Outline 'memory layout'

二月 15, 2006

0 条评论

So far we have arrived at the gate leading to the real kernel. And we’d better stop for a short break in order that we would have more energy to go ahead. Now let’s examine what we do to memory these days.

Virtually what we want to do is drawing some pictures to describe the layout of the memory in various phases. For the layout is related to the bootloader, we’d better make our work based on the following assumption:
The machine has two systems installed (Windows XP and Linux) and uses LILO as the bootloader. Let us look at the LILO configuration:

/* LILO Configuration – /etc/lilo.conf */
boot=/dev/hda
map=/boot/map
install=/boot/boot.b
prompt
timeout=100
compact
default=Linux
image=/boot/vmlinuz-2.6.15.3
         label=Linux
         root=/dev/hda2
         read-only
other=/dev/hda1
         label=WindowsXP

Here the ‘boot=/dev/hda’ indicates it installed the LILO on the MBR of first hard disk. ‘root=/dev/had2′ indicates it installs linux system on the second partition of the first disk and ‘other=/dev/hda1′ indicates it installs windows system on the first partition of the first disk. Since lilo.conf is not read at boot time, the MBR needs to be "refreshed" when this is changed. If you do not do this upon rebooting, none of your changes to lilo.conf will be reflected at startup. Like getting LILO into the MBR in the first place, you need to run: ‘$ /sbin/lilo -v -v’. The ‘-v -v’ flags give you very verbose output.

Now we could switch on our machine! (‘<->’ means ‘begin from … end before …’)
1. Power on <-> BIOS routine
Chaos, that is the character of memory at this time.

2. BIOS routine <–> Bootloader 1st stage(MBR)
BIOS routine runs over and prepares to execute the code loaded from MBR. MBR contains the 1st stage bootloader of the LILO.

       |                        |
0A0000 +————————+
       |                       |
010000 +————————+
       |         MBR           | <- MBR (07C00 ~ 07E00)
001000 +————————+
       |                        |
000600 +————————+
       |      BIOS use only     |
000000 +————————+

3. Bootloader 1st stage(MBR) <-> Bootloader 2nd stage
The bootloader 1st stage moves itself to 0×090000, sets up the Real Mode stack (ranging from 0x09b000 to 0x09a200) and loads the 2nd stage of the LILO from 0x09b000.

          |                        |
0A0000 +————————+
       |     2nd bootloader    |
09b000 +————————+
       |     Real mode stack    |
09A200 +————————+
       |     1st bootloader     |
09A000 +————————+
       |                       |
010000 +————————+
       |     MBR(useless)      | <- MBR (07C00 ~ 07E00)
001000 +————————+
       | Reserved for MBR/BIOS |
000800 +————————+
       | Typically used by MBR |
000600 +————————+
       |      BIOS use only     |
000000 +————————+

4. Bootloader 2nd stage <-> setup.S
The 2nd bootloader copies the integrated boot loader of the kernel image to address 0×090000, the setup() code to address 0×090200, and the rest of the kernel image to address 0×00010000(called ‘low address’ for small Kernel Images compiled with ‘make zImage’) or 0×00100000(‘high address’ for big Kernel Images compiled with ‘make bzImage’).

zImage:

bzImage:

5. setup.S <-> head.S
The setup() checks the position of the Kernel Image loaded in RAM. If loaded "low" in RAM (when using zImage, at physical address 0×00010000) it is moved to "high" in RAM (at physical address 0×00001000). But, if the Kernel image is a "bzImage" loaded in "high" of RAM already, then it’s NOT moved anywhere. It also move the system to its rightful place (0×00000 ~ [<0x090000]). Some system parameters were placed from 0×090000 to 0×090200, which stores the legecy boot sector.

           +————————+
       |         bzImage      |
0100000+————————+
       |                        |
098000 +————————+
>       |     Kernel setup     |
090200 +————————+
       |     System parameters | collected by setup()
090000 +————————+
       |                       |
       |                      |
       |         System        |
       |                       |
       |                        |
000000 +————————+

OK, it is much clear. and now we can walk through the door to the real kernel!