A long time ago when I got my first computer, I accidentally opened a 32-bit demo with a nice chiptune inside MS-DOS and it worked. I was surprised by how this happens. I was curious to find out how this works behind the scenes. Back in the time I was a little kid and had no clue about programming. This curiosity leads me to discover amazing things I never imagined.
First, let us have a look at the PE header. It starts with the MS-DOS header and contains a 16-bit MS-DOS executable (stub program).
(source: https://commons.wikimedia.org/wiki/File:Portable_Executable_32_bit_Structure.png)
This is the MS-DOS header in detail.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#define IMAGE_DOS_SIGNATURE 0x5A4D // MZ #define IMAGE_NT_SIGNATURE 0x00004550 // PE00 typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header WORD e_magic; // Magic number WORD e_cblp; // Bytes on last page of file WORD e_cp; // Pages in file WORD e_crlc; // Relocations WORD e_cparhdr; // Size of header in paragraphs WORD e_minalloc; // Minimum extra paragraphs needed WORD e_maxalloc; // Maximum extra paragraphs needed WORD e_ss; // Initial (relative) SS value WORD e_sp; // Initial SP value WORD e_csum; // Checksum WORD e_ip; // Initial IP value WORD e_cs; // Initial (relative) CS value WORD e_lfarlc; // File address of relocation table WORD e_ovno; // Overlay number WORD e_res[4]; // Reserved words WORD e_oemid; // OEM identifier (for e_oeminfo) WORD e_oeminfo; // OEM information; e_oemid specific WORD e_res2[10]; // Reserved words LONG e_lfanew; // File address of new exe header } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER; |
Every PE file starts with an MS-DOS executable which starts with the IMAGE_DOS_SIGNATURE
. The ASCII representation is 0x5A4D
which is MZ
. The letters “MZ” stands for Mark Zbikowski, who is one of the original architects of MS-DOS and the designer of the MS-DOS executable format. The first member _IMAGE_DOS_HEADER.e_magic
contains the signature.
In the above image at offset 0x3c
is the e_lfanew
member of the _IMAGE_DOS_HEADER
. This address points to the new EXE header which is the _IMAGE_NT_HEADERS
.
In this case, e_lfanew
contains 0x00000120
which points to the beginning of the structure _IMAGE_NT_HEADERS.Signature
which contains the IMAGE_NT_SIGNATURE
. The ASCII representation is 0x00004550
which is PE00
Basically, after experimenting, the Windows loader only cares about the e_magic
and the e_lfanew
members from the _IMAGE_DOS_HEADER
. Because the rest of the members of the DOS header is used by MS-DOS to execute the stub program.
In any 32-bit/64-bit PE file, you will see a tiny MS-DOS stub program. From offset 0x40 to 0x7f is this stub program which is of 64 bytes.
Let us remove the MS-DOS header with the stub program and disassemble the code.
It is a simple 16-bit assembly program which prints This program cannot be run in DOS mode
and exit.
If we run this 64-bit PE inside DOS, the stub will execute, and we get that message.
I wanted to debug this 64-bit PE inside DOS and see for myself how things work. If we look at the DOS header those values can be seen inside the debugger, which proves the MS-DOS header is only needed for MS-DOS to execute its stub. But as I’ve mentioned earlier the members e_magic
and the e_lfanew
are important for the Windows loader.
From the above image, the values e_sp
which is the initial stack pointer 0x00B8
and e_ip
which is the initial instruction pointer 0x0000
can be seen in the below image when beginning to debug. The same 16-bit assembly code we dissembled can be seen which will execute and print that text.
I wanted to write my own stub program. Let us try to write some 16-bit asm code to print my name 5 times in green colour.
The C program looks like this.
1 2 3 4 5 6 |
#include <stdio.h> int main() { size_t i = 0; while (i++<5) puts("@OsandaMalith"); return 0; } |
The following code is in 16-bit MASM. This will be our MS-DOS stub program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
dosseg ifndef ??Version option m510 endif .model small .8086 .stack 0400h .data message db '@OsandaMalith',0Dh,0dh,0Ah len = $ - message .code Main proc mov ax, @data mov ds, ax mov es, ax lea bp, message mov cx, len mov bl, 2 xor dx, dx push dx .repeat push cx ; Save CX (needed for Int 10h/AH=13h below) mov ah, 03h ; VIDEO - GET CURSOR POSITION AND SIZE xor bh, bh ; Page 0 int 10h ; Call Video-BIOS => DX is current cursor position pop cx mov ah, 13h ; VIDEO - WRITE STRING (AT and later,EGA) mov al, 1 ; Mode 1: move cursor xor bh, bh ; Page 0 int 10h pop dx inc dx push dx .until dx == 5 mov ax, 4C01h int 21h Main endp end Main |
After assembling and linking we get the 16-bit DOS executable.
ml /c /Fo hello.obj hello.asm seglink hello.obj, stub.exe,nul,,nul
One way is to patch the MS-DOS stub from the PE header and inject our code. But we will have limited space if we try patching. Since we have the source code, we can tell the linker to use our stub program while linking using the “/stub” parameter.
link hello.obj /stub:stub.exe
If you check the PE headers you can see our newly created stub program in the IMAGE_DOS_STUB
section.
Now finally if we run the 64-bit PE inside MS-DOS, we should nicely see our newly created stub program executing.
Programmers can use this technique to write another program to run in MS-DOS. These techniques were used in demo programs back in the past. You can use this for creating crackmes and in creating CTFs.
This is awesome !!!!
Thank you!
Yeah, reverse engineering is a very rewarding experience with an endless supply of new information if you know where to dig for it. I started disassembling PE binaries just recently, using the freeware version of IDA, and I’ve written about it on my own blog. How did you get the source code for the declarations in the DOS stub? I understand you could have disassembled it, but the fact that you got the original names of the variables leads me to believe you got it from some documentation somewhere.
I guess IDA PRO does it for you. The stub is nothing much to document. It just prints text.
Well, I just looked at the actual web page for this article and am pleased to see that someone else has figured out how to import syntax highlighting into WordPress. Do you use TOhtml too, or do you use something different?
I use Crayons editor.
Yeah, I open the source file in Vim and then use the TOhtml extension to create an HTML file with the syntax-highlighted code, and then I copy and paste the HTML code into WordPress. I don’t know what Crayons is, but I’m guessing it’s some sort of WordPress extension that allows you to do this directly in your blog.
He used html0
https://osandamalith.com/https://osandamalith.com/
Yeah, I open the source file in Vim and then use the TOhtml extension to create an HTML file with the syntax-highlighted code, and then I copy and paste the HTML code into WordPress.
this is awesome