Introduction
A PE (Portable Executable) file is the format Windows uses for executables (.exe), DLLs (.dll), and other binary files. If you want to understand how Windows works under the hood, whether for malware analysis, reverse engineering, or offensive security, understanding PE structure is essential.
In this post, I’ll walk through building a PE parser from scratch in C, explaining each component along the way.
What is a PE File?
Every time you run a program on Windows, the OS loader reads the PE file and maps it into memory. The PE format tells Windows:
- Where the code is
- Where the data is
- What DLLs are needed
- Where to start execution
PE Structure Overview
A PE file is organized into headers and sections:
| Component | Purpose |
|---|---|
| DOS Header | Legacy DOS compatibility, pointer to NT headers |
| DOS Stub | “This program cannot be run in DOS mode” |
| NT Headers | PE signature, file info, memory layout |
| Section Headers | Describes each section (.text, .data, etc.) |
| Sections | Actual code and data |
Setting Up the Project
First, let’s set up our includes and create a function to read a file into memory:
#include <Windows.h>
#include <stdio.h>
Why these includes?
Windows.h— Gives us PE structures (IMAGE_DOS_HEADER,IMAGE_NT_HEADERS) and Windows API functionsstdio.h— Forprintfoutput
Reading the File Into Memory
Before we can parse anything, we need to read the PE file into a buffer:
BOOL
ReadFileIntoBuffer(
_In_ LPCWSTR FilePath,
_Out_ PBYTE *Buffer,
_Out_ PDWORD FileSize
)
{
HANDLE hFile = INVALID_HANDLE_VALUE;
DWORD dwBytesRead = 0;
PBYTE pBuffer = NULL;
DWORD dwFileSize = 0;
hFile = CreateFileW(
FilePath,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL
);
if (hFile == INVALID_HANDLE_VALUE)
{
printf("[-] CreateFileW failed. Error: %d\n", GetLastError());
return FALSE;
}
dwFileSize = GetFileSize(hFile, NULL);
pBuffer = (PBYTE)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, dwFileSize);
if (pBuffer == NULL)
{
printf("[-] HeapAlloc failed. Error: %d\n", GetLastError());
CloseHandle(hFile);
return FALSE;
}
if (!ReadFile(hFile, pBuffer, dwFileSize, &dwBytesRead, NULL))
{
printf("[-] ReadFile failed. Error: %d\n", GetLastError());
HeapFree(GetProcessHeap(), 0, pBuffer);
CloseHandle(hFile);
return FALSE;
}
*Buffer = pBuffer;
*FileSize = dwFileSize;
CloseHandle(hFile);
return TRUE;
}
Key points:
CreateFileWopens the file for readingGetFileSizetells us how much memory to allocateHeapAllocallocates memory on the process heapReadFilecopies file contents into our buffer- Always check return values and clean up on failure
Parsing the DOS Header
The DOS header is always at offset 0 of a PE file. It starts with the magic bytes “MZ” (0x5A4D).
BOOL
ParseDosHeader(
_In_ PBYTE Buffer
)
{
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)Buffer;
if (pDosHeader->e_magic != IMAGE_DOS_SIGNATURE)
{
printf("[-] Invalid DOS signature. Not a PE file.\n");
return FALSE;
}
printf("[+] DOS Header:\n");
printf(" e_magic: 0x%X (MZ)\n", pDosHeader->e_magic);
printf(" e_lfanew: 0x%X (Offset to NT Headers)\n", pDosHeader->e_lfanew);
return TRUE;
}
What’s happening:
- We cast the buffer to
PIMAGE_DOS_HEADERbecause the DOS header is at offset 0 e_magicmust be 0x5A4D (“MZ”) — named after Mark Zbikowski who designed the formate_lfanewis the most important field — it tells us where the NT headers are
Parsing the NT Headers
The NT headers contain the PE signature and two important structures: the File Header and Optional Header.
BOOL
ParseNtHeaders(
_In_ PBYTE Buffer
)
{
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)Buffer;
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(Buffer + pDosHeader->e_lfanew);
if (pNtHeaders->Signature != IMAGE_NT_SIGNATURE)
{
printf("[-] Invalid NT signature. Not a PE file.\n");
return FALSE;
}
printf("[+] NT Headers:\n");
printf(" Signature: 0x%X (PE)\n", pNtHeaders->Signature);
// File Header
printf("[+] File Header:\n");
printf(" Machine: 0x%X\n", pNtHeaders->FileHeader.Machine);
printf(" NumberOfSections: %d\n", pNtHeaders->FileHeader.NumberOfSections);
printf(" TimeDateStamp: 0x%X\n", pNtHeaders->FileHeader.TimeDateStamp);
printf(" Characteristics: 0x%X\n", pNtHeaders->FileHeader.Characteristics);
// Optional Header
printf("[+] Optional Header:\n");
printf(" Magic: 0x%X (%s)\n",
pNtHeaders->OptionalHeader.Magic,
pNtHeaders->OptionalHeader.Magic == 0x20B ? "64-bit" : "32-bit"
);
printf(" AddressOfEntryPoint: 0x%X\n", pNtHeaders->OptionalHeader.AddressOfEntryPoint);
printf(" ImageBase: 0x%llX\n", pNtHeaders->OptionalHeader.ImageBase);
printf(" SizeOfImage: 0x%X\n", pNtHeaders->OptionalHeader.SizeOfImage);
return TRUE;
}
Key fields explained:
| Field | What It Tells Us |
|---|---|
Machine |
Target CPU (0x8664 = x64, 0x14C = x86) |
NumberOfSections |
How many sections to parse |
AddressOfEntryPoint |
Where execution begins (RVA) |
ImageBase |
Preferred load address in memory |
SizeOfImage |
Total size when loaded |
Finding the NT headers:
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(Buffer + pDosHeader->e_lfanew);
This is pointer arithmetic: Base address + offset = NT headers address.
Parsing Section Headers
Sections contain the actual code and data. Common sections include:
| Section | Purpose |
|---|---|
.text |
Executable code |
.data |
Initialized global data |
.rdata |
Read-only data (strings, constants) |
.rsrc |
Resources (icons, dialogs) |
.reloc |
Relocation information |
VOID
ParseSectionHeaders(
_In_ PBYTE Buffer
)
{
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)Buffer;
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(Buffer + pDosHeader->e_lfanew);
PIMAGE_SECTION_HEADER pSectionHeader = IMAGE_FIRST_SECTION(pNtHeaders);
WORD wNumSections = pNtHeaders->FileHeader.NumberOfSections;
printf("[+] Sections (%d):\n", wNumSections);
printf(" %-8s %-10s %-10s %-10s %-10s\n",
"Name", "VirtAddr", "VirtSize", "RawAddr", "RawSize"
);
for (WORD i = 0; i < wNumSections; i++)
{
printf(" %-8.8s 0x%-8X 0x%-8X 0x%-8X 0x%-8X\n",
pSectionHeader[i].Name,
pSectionHeader[i].VirtualAddress,
pSectionHeader[i].Misc.VirtualSize,
pSectionHeader[i].PointerToRawData,
pSectionHeader[i].SizeOfRawData
);
}
}
Understanding section fields:
VirtualAddress— Where the section is loaded in memory (RVA)VirtualSize— Size in memoryPointerToRawData— Where the section is in the fileSizeOfRawData— Size on disk
Putting It All Together
INT
wmain(
INT argc,
PWCHAR argv[]
)
{
PBYTE pFileBuffer = NULL;
DWORD dwFileSize = 0;
if (argc != 2)
{
wprintf(L"[!] Usage: %s <path to PE file>\n", argv[0]);
return -1;
}
wprintf(L"[+] Parsing: %s\n\n", argv[1]);
if (!ReadFileIntoBuffer(argv[1], &pFileBuffer, &dwFileSize))
{
return -1;
}
printf("[+] File size: %d bytes\n\n", dwFileSize);
if (!ParseDosHeader(pFileBuffer))
{
goto Cleanup;
}
if (!ParseNtHeaders(pFileBuffer))
{
goto Cleanup;
}
ParseSectionHeaders(pFileBuffer);
Cleanup:
if (pFileBuffer != NULL)
{
HeapFree(GetProcessHeap(), 0, pFileBuffer);
}
return 0;
}
Example Output
Running the parser against notepad.exe:
[+] Parsing: C:\Windows\System32\notepad.exe
[+] File size: 201216 bytes
[+] DOS Header:
e_magic: 0x5A4D (MZ)
e_lfanew: 0xF0 (Offset to NT Headers)
[+] NT Headers:
Signature: 0x4550 (PE)
[+] File Header:
Machine: 0x8664
NumberOfSections: 7
TimeDateStamp: 0x5E2C4A4B
Characteristics: 0x22
[+] Optional Header:
Magic: 0x20B (64-bit)
AddressOfEntryPoint: 0x1A4B0
ImageBase: 0x140000000
SizeOfImage: 0x39000
[+] Sections (7):
Name VirtAddr VirtSize RawAddr RawSize
.text 0x1000 0x1A3E4 0x400 0x1A400
.rdata 0x1C000 0xCE2C 0x1A800 0xD000
.data 0x29000 0x1690 0x27800 0x600
.pdata 0x2B000 0x1B54 0x27E00 0x1C00
.didat 0x2D000 0x130 0x29A00 0x200
.rsrc 0x2E000 0x9CE8 0x29C00 0x9E00
.reloc 0x38000 0x298 0x33A00 0x400
What I Learned
Building this parser taught me:
- PE structure is logical — Each header points to the next, like a chain
- Pointer arithmetic is essential — Base + offset = target address
- Always validate — Check magic numbers before parsing further
- Windows API patterns — Error checking, cleanup with goto, handle management
- Why this matters for security — Malware analysis, packing, injection all require PE knowledge
What’s Next
Future improvements I’m planning:
- Parse Import Table (see what DLLs a PE uses)
- Parse Export Table (see what functions a DLL exports)
- Detect packed executables
- Add entropy analysis
Source Code
Full source code available on GitHub.
References
- Microsoft PE Format Documentation
- PE Format - Wikipedia
- Windows Internals by Mark Russinovich