AsmSchool: GETTING Down to the bare METAL

Part 3: It’s time to say goodbye to the operating system, and boot your PC from your own code.


  • Learn what compilers do behind the scenes
  • Understand the language of CPUs
  • Fine-tune your code for better performance

In the last two issues we’ve gone through the basics of assembly language, looking at registers, loops, conditionals, the stack and other topics. You now have enough knowledge to write simple assembly programs on Linux – but we’re going to get even more low-level this issue. Yes, we’re going to jettison the operating system and get down to the bare metal of your PC. You’ll write code that executes directly on the CPU and has full control of the machine, without the operating system interfering in any way. Exciting times!

To do this, you need to understand how the PC boot process works, so we’ll go through that step by step. Then we’ll create a simple bootloader that outputs a message to the screen, and show you how to run it in an emulator. We’ll also make it write to removable media such as a USB key, so you can try it on real machines and win an insane number of geek points.

The x86 PC boot process

When you hit the power button on your PC, a bunch of things happen before the Linux kernel is loaded into your RAM banks and executed. Indeed, the PC is just a bunch of chips and has no idea of what a kernel is, or where to find it, or how to even read the filesystem on the disk. A PC on its own would be useless, but fortunately almost every PC includes a BIOS – a “basic input/output system”. (Some very recent PCs include an emulated BIOS, or have deprecated it in favour of the alternative UEFI method – so if you only have UEFI-equipped PCs, you’ll need to use a PC emulator for this tutorial, as explained later.)

The BIOS is simply some firmware provided in the PC, and contains software that the CPU executes as soon as the PC is turned on. Typically the BIOS will perform a bunch of checks to make sure that the PC is in a sane state – for instance, to check that RAM banks are present, and to produce the classic “Press F1 to continue” message when you don’t have a keyboard plugged in.

The BIOS will then attempt to load a chunk of data from some form of media. Most BIOSes know how to access floppy disks, hard drives and CD/DVD-ROM drives, and sometimes USB keys as well. But BIOSes are small, and don’t have space for lots of filesystem drivers. So the BIOS doesn’t understand the ext4 or Btrfs filesystems as used on Linux, and therefore can’t navigate a partition to find the Linux kernel, but it can grab the first 512 bytes from the drive, load it into memory and execute it.


Here’s our code, running in a PC emulator – no operating system required!

Multi-stage to orbit

You can’t do much in 512 bytes, but this chunk of code (known as the first stage bootloader) typically has enough logic to load more data from the disk, this time several kilobytes, which can provide a more fully-featured bootloader with menus and options. Alternatively, this code may go on to load more data from the disk and present an even more advanced bootloader with graphics and wider filesystem support. So in the PC boot process, the computer “pulls itself up by its bootstraps” (which is where the term “booting” comes from).

Now, we can write our own code to fit into these 512 bytes and have full control over the machine. But you may be wondering: without an operating system, how are we going to make a message appear on the screen? Won’t we have to write a complicated video driver, with pixel-plotting routines and font definitions, which will surely be much larger than 512 bytes?

Well, yes – if we didn’t have the BIOS. Along with system health-check and data loading facilities, the BIOS also includes a small set of routines for basic input and output (hence the name). We can ask the BIOS to print a letter to the screen, or check the keyboard for input, without having to write specialised drivers which could require thousands of lines of code. So the BIOS acts as a very rudimentary hardware abstraction layer, letting us do a handful of jobs quickly and easily.


See for a full list of routines provided by the BIOS.

Writing the bare-metal code

So, let’s write some code that fits into this 512-byte space. The following is a short program that prints coloured messages on the screen for infinity – well, until you power off the computer. Type it in and save it in your home directory as boot.asm, or grab it online from

BITS 16 mov ax, 07C0h ; Where we’re loaded mov ds, ax ; Data segment mov ax, 9000h ; Set up stack mov ss, ax mov sp, 0FFFFh ; Grows downwards! mov ah, 0 ; Set video mode routine mov al, 0Dh ; 320x200x16 colours int 10h ; Call BIOS loop: mov si, text_string call print_string inc bl ; Change colour jmp loop text_string db ‘Bare metal rules! ‘, 0 print_string: mov ah, 0Eh ; Print char routine .repeat: lodsb cmp al, 0 je .done int 10h ; Call BIOS jmp .repeat .done: ret times 510-($-$$) db 0 dw 0AA55h ; Boot signature

If you followed the last two assembly language tutorials, some of this will be familiar to you, but a lot of it is new as well. This is largely because we no longer have access to the Linux kernel to handle various tasks – but we can talk to the BIOS. The first line, BITS 16, is a directive that tells NASM (the program that converts assembly language code into binary for the CPU to execute) that our code is 16-bit. When you switch on an x86 PC, it initially operates in 16-bit mode, like PCs of the early 1980s, for backwards compatibility reasons. Modern operating systems like Linux and Windows use various instructions to switch the CPU into 32-bit (or 64-bit) mode, but we don’t need that here – we just want to print some text.

Now, the BIOS loads our 512-byte program into position 07C0 (hexadecimal) in RAM, which is equivalent to 1984 in decimal. (It doesn’t load it into position 0, as that’s taken up with some important system data.) In the first two mov instructions in our code, we set the data segment register (DS) to point to this 07C0h location. Segments are ugly old remnants of 16-bit code, and we won’t deal with them extensively here, but in a nutshell: in a 16-bit register you can store numbers from 0 to 65535. So when using 16-bit memory addresses, you can only access 65536 memory locations – that is, 64k. This is much too small for many tasks, so before 32-bit processors became the norm, 16-bit CPUs used “segments” as offsets to access more RAM.

They made 16-bit programming a mighty pain in the rear, and everyone was happy to move to 32-bit and have easy access to 4GB of RAM. Because our program is tiny, we don’t even need to do any complicated operations with segments, and the chances are that you will never have to again in the future – unless you want to write a 16-bit program larger than 64k.

Anyway, we then have three more mov instructions which set up the stack. We place the stack in a certain segment using the SS (stack segment) register, and then put SP (the stack pointer) at position FFFFh. If you’ve been brushing up on your hexadecimal knowledge since last month, you’ll know that FFFFh = 65535 in decimal. So why are we putting the stack pointer at the very final position in a segment? If we push something onto the stack, won’t it overflow and cause problems in the program?

Well, no. You see, on x86 PCs the stack grows downwards, so when we push a 16-bit (two byte) number onto it, the stack pointer is actually decremented by two bytes. When you pop a number off, it goes back up. (If you keep popping off more than you’ve pushed on, it will go up over 65535 and you’ll have lots of fun and games in your debugging…)


And here’s the code running off a USB key (emulating a floppy drive) on an Asus laptop. This is the real deal.

Taste the rainbow

So, we’ve done the ugly segment-related work, and now we can get our hands dirty with some actual code that does interesting stuff. First up, we want to switch to a graphics video mode so that we can easily print coloured messages. That’s what these three lines do:

mov ah, 0 ; Set video mode routine mov al, 0Dh ; 320x200x16 colours int 10h ; Call BIOS

Do you remember from the previous tutorials that we called the Linux kernel using int 80h? Well, to access the BIOS we use int 10h, and the BIOS also needs various parameters supplied in registers. Normally you place the BIOS routine you want to use in the AH register, and then extra parameters in the other registers. For instance, to change the video mode we need to place zero in AH – and how do we know that? In the olden days we’d have a thick book detailing the BIOS’s inner workings, but today we can find a list of BIOS routines on the web, eg

You’ll see there that there are routines for “Set video mode”, “Write graphics pixel”, “Teletype output” (which we’ll use in a moment) and so forth. If you click on the Int 10/AH=00h link you’ll see a list of video modes underneath, and here we’re using 0Dh, which is 320×200 pixels in 16-colour mode. That’s ridiculously low-res by today’s standards, but ensures that the code will work almost everywhere, including on that old late 80s box gathering dust in your attic.

Then we have a loop:

loop: mov si, text_string call print_string inc bl ; Change colour jmp loop

This calls a print_string routine (which we define underneath). The routine takes the location of a zero-terminated string in the SI register, and a colour in BL. This loop goes on forever, and in each iteration we increment the BL register, so it goes from 0 to 255 and then flows over back to 0. This gives us a constant cycle of colours for the message text.

Underneath, you can see that the print_string routine is somewhat similar to the one we implemented last month, albeit simpler as we don’t have to work out the string length. This time we use the BIOS’s teletype routine, 0Eh, which prints a character to the screen and moves the cursor onwards. The specific character is provided in the AL register (which we retrieve via the lodsb instruction) and the colour is set in BL as mentioned. So in this subroutine we keep retrieving characters from the string and printing them via the BIOS (int 10h) until we hit a zero, and then we ret (return) to the calling code.

One thing to note here is the labels with periods in front of them, eg:


The period denotes that this is a local label, and NASM extends it by prefixing it with the nearest full (non-period) label above. So NASM turns this into print_string.repeat when it works through the code. Why is this useful, you may ask? Well, it means you can use the same local label name multiple times in your code. In a big source file, you may want to use lots of labels like loop, repeat or finish. With local labels, each routine can have its own versions of these – you don’t need to come up with unique names every single time.

The final two lines in our code aren’t instructions, but directives for NASM:

times 510-($-$$) db 0 dw 0AA55h ; Boot signature

For the BIOS to recognise and load our program, it has to be exactly 512 bytes in size and end with the number AA55h. So the first line here pads out our program with zero bytes until it reaches 510 bytes in size, and then we define a “word” (a 16-bit or two-byte value) of 0AA55h to put at the end.

And that’s it! We haven’t done a huge amount here, but you can add more of your own code to this bootloader, providing that the resulting binary doesn’t grow any larger than 512 bytes. (If your code becomes too big, NASM will complain when you try to assemble it.) 512 bytes may seem tiny, but cunning coders can eke quite a bit of functionality out of this limited space, as shown in the “512-byte OS contest” at Here, programmers were challenged to make something impressive in 512 bytes, and they certainly succeeded: one developer wrote a pseudo-3D car-racing screen saver, while another implemented the Game of Life.

Another project you may find useful, either as a source of code snippets you can nab, or just general inspiration, is Tetranglix at This is basically Tetris implemented in a bootloader – so inside 512 bytes – and while it’s not much of a looker, it maintains the core gameplay elements of the timeless classic. Then there’s BootChess, which proudly proclaims that it’s the “smallest computer implementation of chess on any platform”, weighing in at just 487 bytes:

Running on real hardware

If your PC happens to have an inbuilt floppy drive, you can write the virtual disk image to a real disk using this:

dd if=floppy.img of=/dev/fd0 bs=1024

You may need to do this as root, and if it’s a USB floppy drive, change the device to /dev/sdb1 or similar – use dmesg after plugging in the drive to see its device name. Then you can boot your PC from the floppy disk and see your code running natively on your PC.

Chances are that you haven’t used floppy disks in many years, however, but there’s another option: USB keys. Many BIOSes have the facility to load a floppy disk image from a USB key and execute it like a real floppy. Note that this will completely erase the USB key until you next reformat it! Plug in the key and then enter dmesg in a terminal. In the most recent output at the end, you’ll see various messages like this:

sd 2:0:0:0: [sdc] 501760 512-byte logical blocks

This tells us that the drive we plugged in has the device name sdc – it may be different in your case. Unmount/eject the drive using your file manager (or the umount command at the command line), and then write the floppy drive image to the key as follows:

dd if=floppy.img of=/dev/sdc bs=1024

Be sure to get this exactly right, and replace /dev/sdc with whatever you saw from the dmesg output. Ask on our forums ( if you get stuck.

Once the data has been written and you’re returned to the prompt, restart your PC and in the BIOS boot menu, choose to boot from the USB key. All being well, you’ll see the colourful messages again, but this time running on your very own hardware. How cool is that? The answer is: very cool.


For more examples of 16-bit bootloader and simple operating system source code, see (MS-DOS 1.1 and 2.0)

Running the code

To see our code in action, we can boot it in a PC emulator. And it also needs to be on some kind of media. The simplest way to do this is to create a virtual floppy disk – ie a disk image – so install the dosfstools package from your distro’s repositories and enter this command:

mkdosfs -C floppy.img 1440

This creates a new DOS-formatted disk image called floppy.img that’s 1.4MB in size. Next, assemble the code:

nasm -f bin -o boot.bin boot.asm

The -f bin is important here, as we want a plain binary file – we don’t need a complicated Linux executable with all its extra bits and bobs. This creates a 512-byte file called boot.bin, and we inject it into the start of the floppy disk image like so:

dd conv=notrunc if=boot.bin of=floppy.img

Now install a PC emulator such as DOSBox or
QEMU from your distro’s repositories, and boot your virtual floppy disk in them using one of these commands:

dosbox floppy.img qemu-system-i386 floppy.img

And voilà: the coloured messages zoom by, produced by the bare-metal code that you’ve just written. Not bad, eh? If you want to try it on real hardware, see the boxout, left – and next month we’ll expand this bootloader considerably so that it can execute more programs from the disk, including programs you’ve written yourself. Yes, we’ll turn it into a rudimentary, but functioning, operating system!