In a previous tutorial, we wrote a simple Hello, World program in assembly language. While very small, the final version was still dependent on the FreeBSD kernel to execute the equivalent of a write(2) system call.
But what if there’s no kernel running? How about writing a hello world on the “bare metal”, i.e. without the help of an operating system?
In the beginning…
In the beginning, Man created Hardware and Firmware.
And the RAM was empty and without program. And the spirit of the Firmware laid lurking in a Flash/ROM chip, sleeping in deep inactivity.
And Man said, Let there be light; and the CPU awoke with a start.
And, after shaking up the cobwebs of sleep, the CPU initialized itself to Real Mode, by setting its registers to well-defined values. And everything was alright and as it should be.
Then, the CPU jumped to the address 0xFFFF:FFF0, which was the entry point of the Firmware. And the BIOS awoke and took over from there, initializing the rest of the Hardware.
When the Hardware was fully initialized, the BIOS looked at its list of boot devices, and — lo and behold! — it started loading the first 512-bytes sector of the first boot device into the RAM at address 0x0000:7C00.
And the world was not void and empty anymore, as it contained a lot of bytes for the CPU to start playing with.
Then, delighted that the loading “just worked”(tm), the mighty BIOS checked that the last word of that sector contained the magic bytes 0x55 and 0xAA. For if it didn’t contain them, the BIOS would have had to read the first sector of the next boot device, and eventually give up booting, if it didn’t find a sector with that magic string at the end.
And, having verified that the sector it loaded into RAM was valid and magic, the BIOS made the CPU jump (with joy) to this initial address.
Hello, Real Mode World!
So, our task as bare metal programmers is to provide up to 512 or 0x200 bytes that the CPU is supposed to execute. How the heck do we generate those bytes? And, more importantly, what do they look like?
The bytes will be, of course, a mix of opcodes and their operands, that the CPU will execute (starting in Real Mode), and the bytes that make up the string “Hello, World!”. If you’re interested, we’ll peek ahead and have a look:
This gibberish won’t mean a lot to you here, but disassemblers like objdump(1) can do wonders to improve readability. Here’s a small subset:
For all this to make sense, let’s proceed slowly and step by step.
The source code
To build those bytes, we’ll write three files:
- Makefile, which contains the instructions how to build and execute our program;
- biosfunc.S, which will serve as an include file and which will contain multiple convenience functions used by the main program;
- hello.S, which contains the assembly code for the main program.
So, let’s start with the Makefile:
So, what do we see here? The most important instructions are the assembly and linking steps. To assemble hello.S into an object file hello.o the Makefile invokes the GNU Assembler gas (as) like this:
This object file hello.o isn’t the end product yet, as it is not executable as-is on the bare metal. Indeed, it is an ELF object file:
We’ll look at this file later.
To get an executable file, this object file needs to be linked in a very particular way. Yes, we don’t want an ELF executable, because the BIOS can’t interpret ELF headers at all! That is why the Makefile invokes the GNU Linker ld with a series of flags to turn hello.o into hello:
Before we look at those flags in detail, let’s examine hello:
Don’t expect to run this file on FreeBSD!
To run this program, we have two options:
- We could copy those 512 bytes to an USB key, to a diskette etc… (with something like
dd if=hello of=/dev/floppy bs=512 count=1), and boot real hardware.
- We could install an 8086 or 80386+ emulator like qemu or VirtualBox on the host machine, and run hello in the emulator.
The Makefile contains instructions for running a 512-byte bare metal program with qemu. If you’re under X, running hello is as simple as calling:
mv -i hello boot0; make xrun, if you prefer to use the Makefile. This will open an emulator window, and show the string “Hello, World!” in it.
If you’re not running X, all is not lost: just append
-curses to the qemu line:
You can turn off qemu by closing its Window on X, or by issuing the command
killall -9 qemu from another (virtual) console, if you started qemu with the
The BIOS convenience functions
As already said, the include file biosfunc.S contains a couple of convenience functions (mostly I/O-related):
You can see that we’re quite lucky here: in Real Mode, all the BIOS functions are available to us. This is not the case in Protected Mode, where we’ll have to provide replacements for those BIOS calls. So what do we have here?
- The function clrscr clears the video screen by invoking the BIOS function “SCROLL UP WINDOW” (int 10h, %ah=6, %al=0). Some additional parameters are needed and provided as well.
- The function curshome moves the cursor to the upper left corner of the video screen, by invoking the BIOS function “SET CURSOR POSITION” (int 10h, %ah=2) with appropriate parameters.
- The function putc prints a single character on the video screen, using the BIOS function “TELETYPE INPUT” (int 10h, %ah=eh).
- The function puts prints all characters of a buffer pointed to by %si, by using putc repeatedly, until a \0-byte is reached.
Caution: please note that we don’t push and pop used registers here. It’s up to the caller of those routines to take care of that!
The main program
Now we’re finally ready to show the main program hello.S:
The most important lesson to be learned here is that before we can call the routines:
we need a properly initialized stack. Why? Well, every
callw instruction here pushes the return address on the stack, and every
retw instruction in the called routines pops the return address back from the stack. Furthermore, the BIOS calls that we invoke in those biosfunc.S routines probably need stack space as well, so we better provide a good initialized stack.
But where the heck do we put this stack? Remember that the BIOS loaded our 512-bytes program at address 0000:7C00? This program thus occupies RAM from 0000:7C00 to 0000:7DFF. This leaves a lot of addresses in RAM that could be used for the stack. But remember that we run in Real Mode at this point. So everything above 1 MB isn’t accessible anyway, and even in Real Mode, not every address below 1 MB is freely accessible. Indeed, look at this:
- 00000 – 003FF, RAM, Real Mode Interrupt Vector Table (IVT)
- 00400 – 004FF, RAM, BIOS Data Area (BDA)
- 00500 – 9FBFF, RAM, Free Memory (below 1MB), 630K
- 9FC00 – 9FFFF, RAM, Extended BIOS Data Area (EBDA)
- A0000 – BFFFF, VRAM, VGA Frame Buffer
- C0000 – C7FFF, ROM, Video BIOS 32K
- C8000 – EFFFF, Nothing (hole)
- F0000 – FFFFF, ROM, Motherboard BIOS 64K
We’re currently loaded in the 00500 – 9FBFF area, and our stack better be in this area as well, or all hell will break loose!
In this example, we’ve decided to put the stack at 0000:7C00. “Wait a moment”, would you say, “isn’t that the same as the address where our program starts?” Well, congratulations: you weren’t asleep, and you’re right! “But, wouldn’t that collide with our code? Wouldn’t the stack override this code?” Well, it would… if the stack grew towards higher addresses. But it doesn’t! On the IA-32 platform, the stack grows towards lower addresses. In other words, if %sp points to 0x7c00, a pushw would decrement %sp by two, so it would then point to 0x7bfe, i.e. below our code. Each additional
push would decrement the stack pointer more, and the stack will “grow” towards the bottom, until it reaches 0x0500, the end of free memory area. Fortunately, we don’t need such a deep stack for such a simple program.
So, how is the stack (re)located? We simply load the address 0x7c00 into %sp, and all is well:
The gory details
As said, the object file hello.o isn’t an executable program. It won’t run, neither under an operating system, nor on the bare hardware or in an emulator. Why not? Because it’s in ELF format.
Being in ELF format does have its advantages though: it could e.g. be manipulated by the binutils tools, like, say, objdump:
If you love details, this is very interesting:
- Look at the section table, at index 0: the .text section is indeed 0x200 (512) bytes long, but in this context, it starts at offset 0x40. That’s why the file can’t be run on the bare metal: the instructions in .text don’t start at offset 0! At the beginning of hello.o, there’s an ELF header, that will need to be stripped away.
- If you look at the symbol table, you’ll notice the addresses of the functions (clrscr, curshome, etc..), and the address of the buffer msg, which is also in the .text section.
- Interestingly, there’s a relocation record for the symbol msg (offset 0x4a) at address 0x16 (at the end of the output). What does that tell us? Hold on, we’ll come to this soon!
Besides showing ELF headers, objdump can also disassemble the code:
If you pay close attention, you’ll notice a couple of important things:
- The functions are only disassembled correctly, if we tell objdump that the target is running in Real Mode, i.e. that the processor is an i8086. That’s what the
--architecture=i8086flag is for. Had we omitted this flag, objdump would have mistakenly disassembled all those opcodes, as if they belonged to 64-bit mode (or 32-bit mode, if I had used objdump on FreeBSD/i386 instead of FreeBSD/amd64).
- The disassembly of the label msg and of the magic bytes at the end is non-sensical. Actually, a well-behaved disassembler should have output the “Hello, World!\r\n” string, and the 0x55, 0xAA bytes as-is, uninterpreted. But that’s not the disassembler’s fault: msg and the magic string were actually put in the .text section after all, so we can’t blame objdump here.
- The disassembled code for the other functions is remarkably clear. For example, try to follow the code for
callw clrscrin the main program at offset 0xb: we have
call 1c, and if we look at offset 0x1c, that’s really the begin of the function clrscr. If you look at the other function calls, it’s the same.
Remember the following strange looking relocation record for msg (a.k.a offset 0x4a)?
What the heck is a relocation record?! That’s a hint for the linker to adjust (to patch) some bytes in the object file when writing the executable file. This example is particulary clear cut, and serves as an excellent illustration.
So, what does this relocation record tell us (and the linker)? It says that at address 0x16 (that’s the OFFSET, relative to the beginning of the .text section), two bytes (16 bits, that’s the TYPE of R_X86_64_16) have to be patched by the linker, to the VALUE .text+0x4a. Come again: What?!?!
To understand the reason for this strange relocation record, let’s look at address 0x16 where two bytes allegedly need to be patched:
At address 16 (and 17), we have two zero-bytes
00 00, following the opcode be. In other word, the code as-is would move $0x0 into %si. But that’s not what we wanted! In the original source code, we wanted to put the address of msg into %si, and not $0x0. Remember?
So, something clearly went wrong when assembling the source code. Or so it seems at first sight. What we really want the CPU to execute, that’s this:
In other words, the address of msg (0x4a relative to .text) at runtime must be 0x7c00 + 0x4a, and not 0x0 as in the object file.
So, the relocation record above tells the linker to patch the two 00 bytes at offset 16, and replace them with VALUE, i.e. with .text+0x4a. And since .text will be linked to the address 0x7C00 (see below), the ultimate value to put there will be 0x7c00 + 0x4a == 0x7c4a. Or, because we’re little endian on IA-32, the bytes will be 0x4a and 0x7c (they are reversed).
The lesson to remember here is that the assembler creates one relocation record per location in the code that needs to be patched by the linker before the code can be executed. For example, every time the code references msg, the assembler will create an additional relocation record. In this case, we referenced msg only once in our program, so we had only one relocation record.
Other types of relocation records are possible too. For example, when linking in functions from other libraries, etc… We won’t go into the gory details of linking and loading here though; just keep that in mind.
Just one more word on relocation records: look at the bytes at offset 18:
How comes there’s no mention of 0x37 in the operand bytes (0x1c, 0x00) of the opcode 0xe8? Here, we have so called PIC, position independent code. If you do the math (here, in a Python shell):
With other words, the target of the call is specified in the operand relative to the position of the next opcode (i.e. 3 bytes away): 0x1c bytes further away. That same we can observe here:
Here too, we have:
That’s why there are no relocation records for those jump calls: no matter where the linker relocates this code, no operands need to be changed, because they are position independent. Had we used absolute jumps, the assembler would have created additional relocation records for the linker to patch.
Now is the time to look at the way, how the linker translates hello.o into the executable file hello. Remember that the linker has, among others, to:
- Read, interpret, and strip the ELF headers from hello.o.
- Apply all relocation records, by patching some bytes in the output file.
In this simple example, that’s all the linker has to do. In more complex cases, it may also have to do some symbol management, like loading code from libraries (we don’t need that here, because we included our biosfunc.S verbatim into hello.S, so for the linker, all this is just one big file / assemble unit), etc.
Invoking the linker ld(1) is usually very easy. Here, however, we need some special flags, due to the nature of the desired executable.
- The linker must assume that the section .text will be loaded (by the BIOS) at the unusual address 0x7c00. Normally, in a hosted environment like Unix, the start of .text is at (virtual) address 0x0. But here on the bare metal, the BIOS didn’t do us the favor of loading the program at such a simple start address: in fact, it couldn’t do us this favor, because of the way real memory is layed out (see above). Remember also that the relocation record relies on .text to be accurate at runtime. Changing .text‘s address to 0x7c00 is done with the
- The linker also needs to know the start address, i.e. the first address to be executed. We provide this with
-e start, because execution has to start at the label start.
- The output format is significantly different from the one used on the host machine (i.e. it’s not ELF 86-64 or something like this): it is binary. In other words, don’t include an ELF header for the executable, as this header would confuse the BIOS which has no notion of ELF at all. To specify a binary (bare) output format, we use
- The option
--omagic) sets the text and data sections to be readable and writable (duh… we couldn’t care less). More importantly, it disables linking against shared libraries (we don’t need FreeBSD’s libc here!), which is the reason we need this flag.
So, to summarize, we link hello.o into hello like this:
Finally, we can disassemble this file (remember that starting at address 0x4a, the disassembly becomes non-sensical: that’s where our msg and later the magic bytes are located).
One final word: try to run this program under qemu, and observe qemu in top(1) in another window. You’ll notice that qemu doesn’t consume CPU cycles once “Hello, World!” has been displayed. Had we replaced
hlt with a tight loop, qemu would have continued to use CPU cycles.
Writing programs on the (IA-32) bare metal isn’t as easy as it might seem. There are a lot of limitations:
- The CPU starts in 16-bit Real Mode
- RAM is tight, and has a strange layout
- Space for the (bootstrap) program is particulary tight: 510 bytes at most.
- BIOS functions aren’t as versatile as the services provided by an OS Kernel (be it DOS, or Unix).
For all those reasons, the first program loaded by the BIOS almost invariably acts as a primary bootloader, which loads more sectors from the disk into memory and jumps there. Furthermore, operating systems quickly switch to Protected Mode, and re-implement BIOS services from scratch, because they aren’t available anymore there.
If you want to experiment with bare metal (Real Mode) programming, here’s an exercise for you: try to expand hello.S and biosfunc.S, so that after displaying an appropriate prompt, you start reading one key at the time, and echo it on the screen. If the user hits ‘q’, halt the CPU. Hint: write a function getc that will read a key. Use BIOS function “GET KEYSTROKE” (int 16h, ah=0) for this. Check out the RBIL (Ralf Brown’s Interrupt List) to learn about this BIOS function.
Of course, I’ll provide a solution to this exercise in another post (or not). Happy hacking. ;)