In a previous tutorial, we wrote a simple Hello, World program in assembly language. While very small, the final version was still dependent on the FreeBSD kernel to execute the equivalent of a write(2) system call.

But what if there’s no kernel running? How about writing a hello world on the “bare metal”, i.e. without the help of an operating system?

In the beginning…

In the beginning, Man created Hardware and Firmware.

And the RAM was empty and without program. And the spirit of the Firmware laid lurking in a Flash/ROM chip, sleeping in deep inactivity.

And Man said, Let there be light; and the CPU awoke with a start.

And, after shaking up the cobwebs of sleep, the CPU initialized itself to Real Mode, by setting its registers to well-defined values. And everything was alright and as it should be.

Then, the CPU jumped to the address 0xFFFF:FFF0, which was the entry point of the Firmware. And the BIOS awoke and took over from there, initializing the rest of the Hardware.

When the Hardware was fully initialized, the BIOS looked at its list of boot devices, and — lo and behold! — it started loading the first 512-bytes sector of the first boot device into the RAM at address 0x0000:7C00.

And the world was not void and empty anymore, as it contained a lot of bytes for the CPU to start playing with.

Then, delighted that the loading “just worked”(tm), the mighty BIOS checked that the last word of that sector contained the magic bytes 0x55 and 0xAA. For if it didn’t contain them, the BIOS would have had to read the first sector of the next boot device, and eventually give up booting, if it didn’t find a sector with that magic string at the end.

And, having verified that the sector it loaded into RAM was valid and magic, the BIOS made the CPU jump (with joy) to this initial address.

Hello, Real Mode World!

So, our task as bare metal programmers is to provide up to 512 or 0x200 bytes that the CPU is supposed to execute. How the heck do we generate those bytes? And, more importantly, what do they look like?

The bytes will be, of course, a mix of opcodes and their operands, that the CPU will execute (starting in Real Mode), and the bytes that make up the string “Hello, World!”. If you’re interested, we’ll peek ahead and have a look:

% hd hello
00000000  31 c0 8e c0 8e d8 8e d0  bc 00 7c e8 0e 00 e8 1c  |1.........|.....|
00000010  00 e8 01 00 f4 be 4a 7c  e8 1c 00 c3 55 b4 06 b0  |......J|....U...|
00000020  00 b7 07 b9 00 00 ba 4f  18 cd 10 5d c3 b4 02 b7  |.......O...]....|
00000030  00 ba 00 00 cd 10 c3 ac  3c 00 74 05 e8 03 00 eb  |........<.t.....|
00000040  f6 c3 bb 07 00 b4 0e cd  10 c3 48 65 6c 6c 6f 2c  |..........Hello,|
00000050  20 57 6f 72 6c 64 21 0d  0a 00 00 00 00 00 00 00  | World!.........|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200

This gibberish won’t mean a lot to you here, but disassemblers like objdump(1) can do wonders to improve readability. Here’s a small subset:

% objdump --disassemble-all --target=binary --architecture=i8086 hello
0000000000000000 <.data>:
   0:   31 c0                   xor    %ax,%ax
   2:   8e c0                   mov    %ax,%es
   4:   8e d8                   mov    %ax,%ds
   6:   8e d0                   mov    %ax,%ss
   8:   bc 00 7c                mov    $0x7c00,%sp
   b:   e8 0e 00                call   0x1c
   e:   e8 1c 00                call   0x2d
  11:   e8 01 00                call   0x15
  14:   f4                      hlt   
(...)

For all this to make sense, let’s proceed slowly and step by step.

The source code

To build those bytes, we’ll write three files:

  • Makefile, which contains the instructions how to build and execute our program;
  • biosfunc.S, which will serve as an include file and which will contain multiple convenience functions used by the main program;
  • hello.S, which contains the assembly code for the main program.
The Makefile

So, let’s start with the Makefile:

# Makefile for baremetal utilities
 
PROGS     = hello
INCLIBS   = biosfunc.S
RUNTARGET = boot0
 
all:    $(PROGS) $(INCLIBS)
        @echo $(PROGS) built.
        @echo now mv SOMEPROG $(RUNTARGET)
        @echo then make run, make xrun or make disassemble...
 
hello: hello.o
        ld -N -e start -Ttext 0x7c00 --oformat binary -o hello hello.o
 
hello.o: hello.S $(INCLIBS)
        as -o hello.o hello.S
 
disassemble: $(RUNTARGET)
        objdump --disassemble-all --target=binary --architecture=i8086 $(RUNTARGET)
 
run: $(RUNTARGET)
        @echo qemu will start shortly. Kill it from another console...
        qemu -hda ./$(RUNTARGET) -curses
        echo back from qemu
 
xrun: $(RUNTARGET)
        qemu -hda ./$(RUNTARGET)
 
clean:
        rm -f *.o a.out $(PROGS) $(RUNTARGET) *~

So, what do we see here? The most important instructions are the assembly and linking steps. To assemble hello.S into an object file hello.o the Makefile invokes the GNU Assembler gas (as) like this:

as -o hello.o hello.S

This object file hello.o isn’t the end product yet, as it is not executable as-is on the bare metal. Indeed, it is an ELF object file:

% file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (FreeBSD), not stripped

We’ll look at this file later.

To get an executable file, this object file needs to be linked in a very particular way. Yes, we don’t want an ELF executable, because the BIOS can’t interpret ELF headers at all! That is why the Makefile invokes the GNU Linker ld with a series of flags to turn hello.o into hello:

ld -N -e start -Ttext 0x7c00 --oformat binary -o hello hello.o

Before we look at those flags in detail, let’s examine hello:

% ls -l hello
-rwxr-xr-x  1 farid  users  512 May 25 18:54 hello
 
% file hello
hello: x86 boot sector, code offset 0xc0

Don’t expect to run this file on FreeBSD!

To run this program, we have two options:

  • We could copy those 512 bytes to an USB key, to a diskette etc… (with something like dd if=hello of=/dev/floppy bs=512 count=1), and boot real hardware.
  • We could install an 8086 or 80386+ emulator like qemu or VirtualBox on the host machine, and run hello in the emulator.

The Makefile contains instructions for running a 512-byte bare metal program with qemu. If you’re under X, running hello is as simple as calling:

% qemu -hda ./hello

Alternatively, mv -i hello boot0; make xrun, if you prefer to use the Makefile. This will open an emulator window, and show the string “Hello, World!” in it.

If you’re not running X, all is not lost: just append -curses to the qemu line:

% qemu -hda ./hello -curses

You can turn off qemu by closing its Window on X, or by issuing the command killall -9 qemu from another (virtual) console, if you started qemu with the -curses option.

The BIOS convenience functions

As already said, the include file biosfunc.S contains a couple of convenience functions (mostly I/O-related):

/* biosfunc.S -- real-mode BIOS and convenience functions. */
 
        .file        "biosfunc.S"
        .code16
 
        /*
         * The following convenience functions are only available
         * in real mode through BIOS:
         *
         * void clrscr()        # clear display
         * void curshome()      # move cursor home (0:0)
         * void puts(%si)       # display string
         * void putc(%al)       # display char
         *
         * use this libary like this:
         *   .include biosfunc.S
         */
         
/* clrscr() -- clear dislay */
clrscr:
        /*
         * clrscr() clears the video buffer, using a special case in
         * the BIOS function "SCROLL UP WINDOW".  Note that this
         * function is only available in real mode, and that some
         * buggy BIOSes destroy the base pointer %bp, so we better
         * temporarily save it on the stack.
         */
        pushw %bp               # BIOS call below *can* destroy %BP
         
        movb  $0x06,   %ah      # BIOS function "SCROLL UP WINDOW"
        movb  $0x0,    %al      # nr. of lines to scroll (00=clear window)
        movb  $0x7,    %bh      # attr. to fill new lines at bottom
        movw  $0x0,    %cx      # CH,CL: row,column upper left corner  (00:00)
        movw  $0x184f, %dx      # DH,DL: row,column lower right corner (24:79)
        int   $0x10             # call BIOS
 
        popw  %bp
        retw
 
/* curshome() -- set cursor position to 0:0 */
curshome:
        /*
         * curshome() moves the cursor to position 0:0 (top:left),
         * using the BIOS function "SET CURSOR POSITION".  This
         * function is only available in real mode.
         */
        movb $0x02, %ah         # BIOS function "SET CURSOR POSITION"
        movb $0x0,  %bh         # page number 0
        movw $0x0,  %dx         # DH=0 row, DL=0 col
        int  $0x10              # call BIOS
        retw
 
/* puts(%si) -- display 0-terminated string via putc() */
puts:
        /*
         * puts() repeatedly loads a byte from the buffer pointed
         * to by %si into %al, and displays that byte by calling
         * putc(%al), until a \0-byte is encountered.  The buffer
         * should thus be \0-terminated, like a regular C-string.
         */
        lodsb                   # Load next byte from %si buffer into %al
        cmpb  $0x0, %al         # %al == 0?
        je    puts1             # Yes: end of string!
        callw putc              # No: Display current char
        jmp   puts              # Proceed next char
puts1:  retw
         
/* putc(%al) -- output char %al via BIOS call int 10h, func 0Eh */
putc:
        /*
         * putc(%al) displays the byte %al on the default video
         * buffer, using the BIOS function "TELETYPE OUTPUT".
         * This function interprets some but not all control
         * characters correctly, but it doesn't matter all too
         * much in this simple example.  This BIOS function is
         * only available in real mode.
         */
        movw  $0x7, %bx            # BH: page 0, BL: attribute 7 (normal white)
        movb  $0xe, %ah            # BIOS function "TELETYPE OUTPUT"
        int   $0x10                # call BIOS
        retw

You can see that we’re quite lucky here: in Real Mode, all the BIOS functions are available to us. This is not the case in Protected Mode, where we’ll have to provide replacements for those BIOS calls. So what do we have here?

  • The function clrscr clears the video screen by invoking the BIOS function “SCROLL UP WINDOW” (int 10h, %ah=6, %al=0). Some additional parameters are needed and provided as well.
  • The function curshome moves the cursor to the upper left corner of the video screen, by invoking the BIOS function “SET CURSOR POSITION” (int 10h, %ah=2) with appropriate parameters.
  • The function putc prints a single character on the video screen, using the BIOS function “TELETYPE INPUT” (int 10h, %ah=eh).
  • The function puts prints all characters of a buffer pointed to by %si, by using putc repeatedly, until a \0-byte is reached.

Caution: please note that we don’t push and pop used registers here. It’s up to the caller of those routines to take care of that!

The main program

Now we’re finally ready to show the main program hello.S:

/* hello.S -- Hello, World on bare metal, just after BIOS boot. x86 */
 
        .file "hello.S"
 
        /*
         * A couple of constants.
         *
         * These can't be changed, because they are set by the
         * firmware (BIOS).
         */
        .set LOAD,      0x7c00     # BIOS loads and jumps here
        .set MAGIC,     0xaa55     # Must be at the end of the 512-byte block
        .set BLOCKSIZE, 512        # Boot block is BLOCKSIZE bytes long
 
        /*
         * The .text section contains the opcodes (code) for our
         * program.
         */
        .section .text             # This is a code (text) section.
        .code16                    # Boot code runs in 16-bit real mode
        .globl start               # Entry point is public, for the linker.
start:
        /*
         * The processor starts in real mode and executes the first
         * instruction at address $0xFFFF:FFF0.  System designers
         * usually map BIOS at this address, so the CPU starts running
         * BIOS code.  The BIOS initializes RAM and other components.
         * Then, it loads $BLOCKSIZE bytes from the first boot device
         * in RAM, starting at address $0x0:$LOAD.
         *
         * If that block finishes with the $MAGIC sequence 0x55, 0xaa
         * (it is reversed, because IA-32 arch is little endian), BIOS
         * considers this block a valid boot block, and jumps right here.
         */
 
        /*
         * Initialize segment descriptors %ds, %es, and %ss to 0x0.
         * %cs:%ip is already set by the BIOS to 0x0:$LOAD.
         */
        xorw %ax, %ax
        movw %ax, %es
        movw %ax, %ds
        movw %ax, %ss
 
        /*
         * Initialize the stack.
         *
         * Since the stack on x86 grows towards *lower* addresses,
         * we anchor it at $LOAD.  Note that we don't collide with
         * the code because the stack will always remain below
         * (i.e. less than) $LOAD and grows downwards from there.
         */
        movw $LOAD, %sp
 
        /*
         * This is the "main" program:
         *
         * Clear screen, move cursor to the top:left,
         * and display a friendly greetings.
         */
        callw clrscr                  # clear screen
        callw curshome                # move cursor home - top:left
        callw greeting                # display a greeting string
         
        /*
         * That's all, folks!
         *
         * We could run a tight loop here, but it's better to halt
         * the processor.  When run on bare metal, a halted processor
         * consumes less power (especially useful if ran on battery).
         * When run under an emulator, the emulator doesn't consume
         * further CPU cycles.
         */
        hlt
         
/* greeting() -- display a little message. */       
greeting:
        /*
         * greeting dislays the string located at label msg,
         * using the convenience function puts() defined below.
         * We pass the *address* of that string (thus $msg instead
         * of msg) in the %si register.
         */
        movw  $msg, %si
        callw puts
        retw
 
        /*
         * Finally, include the BIOS convenience functions used above.
         */
         
        .include "biosfunc.S"             # BIOS convenience functions.
        .file    "hello.S"
         
/* msg: the string buffer to be displayed. */
msg:
        .asciz "Hello, World!\r\n"        # must be \0-terminated!
         
        /*
         * The boot block MUST end with a MAGIC sequence.
         *
         * The BIOS checks this, and would refuse to boot unless
         * MAGIC is there.  The last two bytes of the BLOCKSIZE
         * long block must contain the magic sequence 0x55, 0xaa.
         * We move the assembler pointer .org there, and emit the
         * word MAGIC.  Note that MAGIC is set to 0xaa55, and not
         * 0x55aa, because the IA-32 platform is little endian.
         */
        .org BLOCKSIZE - 2
        .word MAGIC

The most important lesson to be learned here is that before we can call the routines:

callw clrscr                  # clear screen
callw curshome                # move cursor home - top:left
callw greeting                # display a greeting string

we need a properly initialized stack. Why? Well, every callw instruction here pushes the return address on the stack, and every retw instruction in the called routines pops the return address back from the stack. Furthermore, the BIOS calls that we invoke in those biosfunc.S routines probably need stack space as well, so we better provide a good initialized stack.

But where the heck do we put this stack? Remember that the BIOS loaded our 512-bytes program at address 0000:7C00? This program thus occupies RAM from 0000:7C00 to 0000:7DFF. This leaves a lot of addresses in RAM that could be used for the stack. But remember that we run in Real Mode at this point. So everything above 1 MB isn’t accessible anyway, and even in Real Mode, not every address below 1 MB is freely accessible. Indeed, look at this:

  • 00000 – 003FF, RAM, Real Mode Interrupt Vector Table (IVT)
  • 00400 – 004FF, RAM, BIOS Data Area (BDA)
  • 00500 – 9FBFF, RAM, Free Memory (below 1MB), 630K
  • 9FC00 – 9FFFF, RAM, Extended BIOS Data Area (EBDA)
  • A0000 – BFFFF, VRAM, VGA Frame Buffer
  • C0000 – C7FFF, ROM, Video BIOS 32K
  • C8000 – EFFFF, Nothing (hole)
  • F0000 – FFFFF, ROM, Motherboard BIOS 64K

We’re currently loaded in the 00500 – 9FBFF area, and our stack better be in this area as well, or all hell will break loose!

In this example, we’ve decided to put the stack at 0000:7C00. “Wait a moment”, would you say, “isn’t that the same as the address where our program starts?” Well, congratulations: you weren’t asleep, and you’re right! “But, wouldn’t that collide with our code? Wouldn’t the stack override this code?” Well, it would… if the stack grew towards higher addresses. But it doesn’t! On the IA-32 platform, the stack grows towards lower addresses. In other words, if %sp points to 0x7c00, a pushw would decrement %sp by two, so it would then point to 0x7bfe, i.e. below our code. Each additional push would decrement the stack pointer more, and the stack will “grow” towards the bottom, until it reaches 0x0500, the end of free memory area. Fortunately, we don’t need such a deep stack for such a simple program.

So, how is the stack (re)located? We simply load the address 0x7c00 into %sp, and all is well:

movw $LOAD, %sp

The gory details

hello.o

As said, the object file hello.o isn’t an executable program. It won’t run, neither under an operating system, nor on the bare hardware or in an emulator. Why not? Because it’s in ELF format.

Being in ELF format does have its advantages though: it could e.g. be manipulated by the binutils tools, like, say, objdump:

% objdump --all-headers --architecture=i8086 hello.o
 
hello.o:     file format elf64-x86-64
hello.o
architecture: i386:x86-64, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x0000000000000000
 
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000200  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000240  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000240  2**2
                  ALLOC
SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 hello.S
0000000000000000 l    df *ABS*  0000000000000000 biosfunc.S
0000000000000000 l    df *ABS*  0000000000000000 hello.S
0000000000000000 l    d  .text  0000000000000000
0000000000000000 l    d  .data  0000000000000000
0000000000000000 l    d  .bss   0000000000000000
0000000000007c00 l       *ABS*  0000000000000000 LOAD
000000000000aa55 l       *ABS*  0000000000000000 MAGIC
0000000000000200 l       *ABS*  0000000000000000 BLOCKSIZE
000000000000001c l       .text  0000000000000000 clrscr
000000000000002d l       .text  0000000000000000 curshome
0000000000000015 l       .text  0000000000000000 greeting
000000000000004a l       .text  0000000000000000 msg
0000000000000037 l       .text  0000000000000000 puts
0000000000000041 l       .text  0000000000000000 puts1
0000000000000042 l       .text  0000000000000000 putc
0000000000000000 g       .text  0000000000000000 start
 
 
RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
0000000000000016 R_X86_64_16       .text+0x000000000000004a

If you love details, this is very interesting:

  • Look at the section table, at index 0: the .text section is indeed 0x200 (512) bytes long, but in this context, it starts at offset 0x40. That’s why the file can’t be run on the bare metal: the instructions in .text don’t start at offset 0! At the beginning of hello.o, there’s an ELF header, that will need to be stripped away.
  • If you look at the symbol table, you’ll notice the addresses of the functions (clrscr, curshome, etc..), and the address of the buffer msg, which is also in the .text section.
  • Interestingly, there’s a relocation record for the symbol msg (offset 0x4a) at address 0x16 (at the end of the output). What does that tell us? Hold on, we’ll come to this soon!

Besides showing ELF headers, objdump can also disassemble the code:

% objdump --disassemble --architecture=i8086 hello.o
 
hello.o:     file format elf64-x86-64
 
Disassembly of section .text:
 
0000000000000000 <start>:
   0:   31 c0                   xor    %ax,%ax
   2:   8e c0                   mov    %ax,%es
   4:   8e d8                   mov    %ax,%ds
   6:   8e d0                   mov    %ax,%ss
   8:   bc 00 7c                mov    $0x7c00,%sp
   b:   e8 0e 00                call   1c <clrscr>
   e:   e8 1c 00                call   2d <curshome>
  11:   e8 01 00                call   15 <greeting>
  14:   f4                      hlt   
 
0000000000000015 <greeting>:
  15:   be 00 00                mov    $0x0,%si
  18:   e8 1c 00                call   37 <puts>
  1b:   c3                      ret   
 
000000000000001c <clrscr>:
  1c:   55                      push   %bp
  1d:   b4 06                   mov    $0x6,%ah
  1f:   b0 00                   mov    $0x0,%al
  21:   b7 07                   mov    $0x7,%bh
  23:   b9 00 00                mov    $0x0,%cx
  26:   ba 4f 18                mov    $0x184f,%dx
  29:   cd 10                   int    $0x10
  2b:   5d                      pop    %bp
  2c:   c3                      ret   
 
000000000000002d <curshome>:
  2d:   b4 02                   mov    $0x2,%ah
  2f:   b7 00                   mov    $0x0,%bh
  31:   ba 00 00                mov    $0x0,%dx
  34:   cd 10                   int    $0x10
  36:   c3                      ret   
 
0000000000000037 <puts>:
  37:   ac                      lods   %ds:(%si),%al
  38:   3c 00                   cmp    $0x0,%al
  3a:   74 05                   je     41 <puts1>
  3c:   e8 03 00                call   42 <putc>
  3f:   eb f6                   jmp    37 <puts>
 
0000000000000041 <puts1>:
  41:   c3                      ret   
 
0000000000000042 <putc>:
  42:   bb 07 00                mov    $0x7,%bx
  45:   b4 0e                   mov    $0xe,%ah
  47:   cd 10                   int    $0x10
  49:   c3                      ret   
 
000000000000004a <msg>:
  4a:   48                      dec    %ax
  4b:   65                      gs
  4c:   6c                      insb   (%dx),%es:(%di)
  4d:   6c                      insb   (%dx),%es:(%di)
  4e:   6f                      outsw  %ds:(%si),(%dx)
  4f:   2c 20                   sub    $0x20,%al
  51:   57                      push   %di
  52:   6f                      outsw  %ds:(%si),(%dx)
  53:   72 6c                   jb     c1 <msg+0x77>
  55:   64 21 0d                and    %cx,%fs:(%di)
  58:   0a 00                   or     (%bx,%si),%al
        ...
 1fe:   55                      push   %bp
 1ff:   aa                      stos   %al,%es:(%di)

If you pay close attention, you’ll notice a couple of important things:

  • The functions are only disassembled correctly, if we tell objdump that the target is running in Real Mode, i.e. that the processor is an i8086. That’s what the --architecture=i8086 flag is for. Had we omitted this flag, objdump would have mistakenly disassembled all those opcodes, as if they belonged to 64-bit mode (or 32-bit mode, if I had used objdump on FreeBSD/i386 instead of FreeBSD/amd64).
  • The disassembly of the label msg and of the magic bytes at the end is non-sensical. Actually, a well-behaved disassembler should have output the “Hello, World!\r\n” string, and the 0x55, 0xAA bytes as-is, uninterpreted. But that’s not the disassembler’s fault: msg and the magic string were actually put in the .text section after all, so we can’t blame objdump here.
  • The disassembled code for the other functions is remarkably clear. For example, try to follow the code for callw clrscr in the main program at offset 0xb: we have call 1c, and if we look at offset 0x1c, that’s really the begin of the function clrscr. If you look at the other function calls, it’s the same.
Relocation records

Remember the following strange looking relocation record for msg (a.k.a offset 0x4a)?

OFFSET           TYPE              VALUE
0000000000000016 R_X86_64_16       .text+0x000000000000004a

What the heck is a relocation record?! That’s a hint for the linker to adjust (to patch) some bytes in the object file when writing the executable file. This example is particulary clear cut, and serves as an excellent illustration.

So, what does this relocation record tell us (and the linker)? It says that at address 0x16 (that’s the OFFSET, relative to the beginning of the .text section), two bytes (16 bits, that’s the TYPE of R_X86_64_16) have to be patched by the linker, to the VALUE .text+0x4a. Come again: What?!?!

To understand the reason for this strange relocation record, let’s look at address 0x16 where two bytes allegedly need to be patched:

0000000000000015 <greeting>:
  15:   be 00 00                mov    $0x0,%si
  18:   e8 1c 00                call   37 <puts>
  1b:   c3                      ret

At address 16 (and 17), we have two zero-bytes 00 00, following the opcode be. In other word, the code as-is would move $0x0 into %si. But that’s not what we wanted! In the original source code, we wanted to put the address of msg into %si, and not $0x0. Remember?

movw  $msg, %si
callw puts
retw

So, something clearly went wrong when assembling the source code. Or so it seems at first sight. What we really want the CPU to execute, that’s this:

15:   be 4a 7c                mov    $0x7c4a,%si
18:   e8 1c 00                call   0x37
1b:   c3                      ret

In other words, the address of msg (0x4a relative to .text) at runtime must be 0x7c00 + 0x4a, and not 0x0 as in the object file.

So, the relocation record above tells the linker to patch the two 00 bytes at offset 16, and replace them with VALUE, i.e. with .text+0x4a. And since .text will be linked to the address 0x7C00 (see below), the ultimate value to put there will be 0x7c00 + 0x4a == 0x7c4a. Or, because we’re little endian on IA-32, the bytes will be 0x4a and 0x7c (they are reversed).

The lesson to remember here is that the assembler creates one relocation record per location in the code that needs to be patched by the linker before the code can be executed. For example, every time the code references msg, the assembler will create an additional relocation record. In this case, we referenced msg only once in our program, so we had only one relocation record.

Other types of relocation records are possible too. For example, when linking in functions from other libraries, etc… We won’t go into the gory details of linking and loading here though; just keep that in mind.

Just one more word on relocation records: look at the bytes at offset 18:

18:   e8 1c 00                call   0x37

How comes there’s no mention of 0x37 in the operand bytes (0x1c, 0x00) of the opcode 0xe8? Here, we have so called PIC, position independent code. If you do the math (here, in a Python shell):

>>> hex(0x1c+0x18+3)
'0x37'

With other words, the target of the call is specified in the operand relative to the position of the next opcode (i.e. 3 bytes away): 0x1c bytes further away. That same we can observe here:

b:   e8 0e 00                call   1c <clrscr>
 e:   e8 1c 00                call   2d <curshome>
11:   e8 01 00                call   15 <greeting>

Here too, we have:

>>> hex(0x0e+0xb+3)
'0x1c'
>>> hex(0x1c+0xe+3)
'0x2d'
>>> hex(0x1+0x11+3)
'0x15'

That’s why there are no relocation records for those jump calls: no matter where the linker relocates this code, no operands need to be changed, because they are position independent. Had we used absolute jumps, the assembler would have created additional relocation records for the linker to patch.

hello

Now is the time to look at the way, how the linker translates hello.o into the executable file hello. Remember that the linker has, among others, to:

  • Read, interpret, and strip the ELF headers from hello.o.
  • Apply all relocation records, by patching some bytes in the output file.

In this simple example, that’s all the linker has to do. In more complex cases, it may also have to do some symbol management, like loading code from libraries (we don’t need that here, because we included our biosfunc.S verbatim into hello.S, so for the linker, all this is just one big file / assemble unit), etc.

Invoking the linker ld(1) is usually very easy. Here, however, we need some special flags, due to the nature of the desired executable.

  • The linker must assume that the section .text will be loaded (by the BIOS) at the unusual address 0x7c00. Normally, in a hosted environment like Unix, the start of .text is at (virtual) address 0x0. But here on the bare metal, the BIOS didn’t do us the favor of loading the program at such a simple start address: in fact, it couldn’t do us this favor, because of the way real memory is layed out (see above). Remember also that the relocation record relies on .text to be accurate at runtime. Changing .text‘s address to 0x7c00 is done with the -Ttext 0x7c00 flag.
  • The linker also needs to know the start address, i.e. the first address to be executed. We provide this with -e start, because execution has to start at the label start.
  • The output format is significantly different from the one used on the host machine (i.e. it’s not ELF 86-64 or something like this): it is binary. In other words, don’t include an ELF header for the executable, as this header would confuse the BIOS which has no notion of ELF at all. To specify a binary (bare) output format, we use --oformat binary.
  • The option -N (or --omagic) sets the text and data sections to be readable and writable (duh… we couldn’t care less). More importantly, it disables linking against shared libraries (we don’t need FreeBSD’s libc here!), which is the reason we need this flag.

So, to summarize, we link hello.o into hello like this:

% ld -N -e start -Ttext 0x7c00 --oformat binary -o hello hello.o

Finally, we can disassemble this file (remember that starting at address 0x4a, the disassembly becomes non-sensical: that’s where our msg and later the magic bytes are located).

% objdump --disassemble-all --target=binary --architecture=i8086 hello
 
hello:     file format binary
 
Disassembly of section .data:
 
0000000000000000 <.data>:
   0:   31 c0                   xor    %ax,%ax
   2:   8e c0                   mov    %ax,%es
   4:   8e d8                   mov    %ax,%ds
   6:   8e d0                   mov    %ax,%ss
   8:   bc 00 7c                mov    $0x7c00,%sp
   b:   e8 0e 00                call   0x1c
   e:   e8 1c 00                call   0x2d
  11:   e8 01 00                call   0x15
  14:   f4                      hlt   
  15:   be 4a 7c                mov    $0x7c4a,%si
  18:   e8 1c 00                call   0x37
  1b:   c3                      ret   
  1c:   55                      push   %bp
  1d:   b4 06                   mov    $0x6,%ah
  1f:   b0 00                   mov    $0x0,%al
  21:   b7 07                   mov    $0x7,%bh
  23:   b9 00 00                mov    $0x0,%cx
  26:   ba 4f 18                mov    $0x184f,%dx
  29:   cd 10                   int    $0x10
  2b:   5d                      pop    %bp
  2c:   c3                      ret   
  2d:   b4 02                   mov    $0x2,%ah
  2f:   b7 00                   mov    $0x0,%bh
  31:   ba 00 00                mov    $0x0,%dx
  34:   cd 10                   int    $0x10
  36:   c3                      ret   
  37:   ac                      lods   %ds:(%si),%al
  38:   3c 00                   cmp    $0x0,%al
  3a:   74 05                   je     0x41
  3c:   e8 03 00                call   0x42
  3f:   eb f6                   jmp    0x37
  41:   c3                      ret   
  42:   bb 07 00                mov    $0x7,%bx
  45:   b4 0e                   mov    $0xe,%ah
  47:   cd 10                   int    $0x10
  49:   c3                      ret   
  4a:   48                      dec    %ax
  4b:   65                      gs
  4c:   6c                      insb   (%dx),%es:(%di)
  4d:   6c                      insb   (%dx),%es:(%di)
  4e:   6f                      outsw  %ds:(%si),(%dx)
  4f:   2c 20                   sub    $0x20,%al
  51:   57                      push   %di
  52:   6f                      outsw  %ds:(%si),(%dx)
  53:   72 6c                   jb     0xc1
  55:   64 21 0d                and    %cx,%fs:(%di)
  58:   0a 00                   or     (%bx,%si),%al
        ...
 1fe:   55                      push   %bp
 1ff:   aa                      stos   %al,%es:(%di)

One final word: try to run this program under qemu, and observe qemu in top(1) in another window. You’ll notice that qemu doesn’t consume CPU cycles once “Hello, World!” has been displayed. Had we replaced hlt with a tight loop, qemu would have continued to use CPU cycles.

Conclusion

Writing programs on the (IA-32) bare metal isn’t as easy as it might seem. There are a lot of limitations:

  • The CPU starts in 16-bit Real Mode
  • RAM is tight, and has a strange layout
  • Space for the (bootstrap) program is particulary tight: 510 bytes at most.
  • BIOS functions aren’t as versatile as the services provided by an OS Kernel (be it DOS, or Unix).

For all those reasons, the first program loaded by the BIOS almost invariably acts as a primary bootloader, which loads more sectors from the disk into memory and jumps there. Furthermore, operating systems quickly switch to Protected Mode, and re-implement BIOS services from scratch, because they aren’t available anymore there.

If you want to experiment with bare metal (Real Mode) programming, here’s an exercise for you: try to expand hello.S and biosfunc.S, so that after displaying an appropriate prompt, you start reading one key at the time, and echo it on the screen. If the user hits ‘q’, halt the CPU. Hint: write a function getc that will read a key. Use BIOS function “GET KEYSTROKE” (int 16h, ah=0) for this. Check out the RBIL (Ralf Brown’s Interrupt List) to learn about this BIOS function.

Of course, I’ll provide a solution to this exercise in another post (or not). Happy hacking. ;)