Headers, Symbols, and other things you won't find
Eight years ago I wrote ELF’s Linker’s and other magical creatures – a walkthrough of the ELF binary format, relocations, segments, and even live code injection through /proc/pid/mem. That post ended in a comfortable place: .text, .data, .bss; ld resolving symbols; the kernel’s ELF loader mapping segments into memory; gdb poking at a running process. The civilized world of userspace binaries.
Then someone hands you a 1.5 MB file. No extension, no headers, no documentation. file says “data.” readelf says “not an ELF file.” objdump refuses to open it. Welcome to firmware reverse engineering.
This is Part 1 of a five-part series that picks up where the ELF post left off, but for bare-metal firmware on MIPS32 embedded devices. Everything you learned about ELF structure, symbol tables, and linking is still useful – mostly as a reference for what isn’t in front of you anymore.
The techniques come from a real project analyzing production firmware on an embedded network device. I can’t name the vendor (lawyers), but every pattern, every tool, every trick here is something we actually used in anger. To keep it reproducible, we’ll build a sample MIPS32 firmware – a small network gateway with a UART serial console – that recreates the same structures.
Bonus – the sample firmware bundled: if you’d rather skip the build and start poking at bytes, the same sources, the resulting
firmware.binandfirmware.elf, the multi-partition image, and the smalltools/directory used in Part 2 are all bundled in sample_firmware.tar.gz. Grab it and follow along on your own copy:curl -L -O https://res.cloudinary.com/gotocco/raw/upload/v1777182332/sample_firmware.tar.gz tar xf sample_firmware.tar.gz cd sample_firmware
What’s Missing: ELF vs Flat Binary
The quickest way to understand flat firmware is to compare it with what you already know. Here’s what a standard ELF binary looks like at byte zero:
$ xxd -l 64 firmware.elf
00000000: 7f45 4c46 0101 0100 0500 0000 0000 0000 .ELF............
00000010: 0200 0800 0100 0000 0000 0080 3400 0000 ............4...
00000020: a431 0100 0110 0070 3400 2000 0500 2800 .1.....p4. ...(.
00000030: 0d00 0c00 0300 0070 0821 0100 0821 0080 .......p.!...!..You can spot the 7F 45 4C 46 magic immediately – .ELF. After that comes the class (32-bit), endianness (little), the machine type (MIPS), the entry point, program header offset, section header offset. Everything the kernel’s loader needs to map this binary into memory.
Now here’s what a firmware binary looks like at byte zero:
$ xxd -l 64 firmware.bin
00000000: 0580 1d3c 0000 bd37 0480 1c3c 0000 9c37 ...<...7...<...7
00000010: 8000 0008 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................No magic bytes. No headers. The very first byte of this file is a machine instruction. If you happen to read MIPS, you can decode it on sight: lui sp, 0x8005 – loading the upper half of the stack pointer, the canonical first move of a freshly-booted CPU. Once the boot ROM (or first-stage bootloader) copies the payload to RAM and jumps to its entry address, this is where execution begins. (Register conventions come in Part 2. For now, sp, ra, a0..a3, and s0..s7 are the names you’ll keep seeing.)
Here’s what each format gives you:
| Feature | ELF | Flat firmware |
|---|---|---|
| Magic bytes | 7F 45 4C 46 |
None (first instruction) |
| Section headers | .text, .data, .bss, … |
No |
| Symbol table | Function names, variables | No |
| Relocation entries | Linker fixups | No |
| Entry point | e_entry field |
Byte 0 (or known fixed address) |
| Load method | Kernel parses headers, maps segments | Bootloader copies to RAM, jumps |
Remember readelf -S showing section headers? nm listing every function by name? objdump -tT dumping the symbol table? None of that works here. The flat binary has no metadata. The hex dump is the documentation.
Why Firmware Is Flat
This isn’t an accident or a limitation – it’s a deliberate engineering choice. Here’s why embedded firmware ships as raw binary blobs:
Bootloader simplicity. The first code that runs when a MIPS chip powers on is a tiny bootrom, often a few hundred bytes burned into mask ROM. It does not have an ELF parser. It loads a fixed payload from flash into RAM and jumps to a known entry address – in our sample, 0x80000000. Any complexity in the binary format becomes complexity in silicon, and silicon has to be right on the first wafer.
Speed. A router that takes 30 seconds to come up after a power cycle is a router nobody deploys. Flat binaries boot in one memcpy – no parsing, no relocations, no dynamic linking. Copy the bytes, jump to the entry point, you’re running.
Deterministic layout. The developer controls exactly where every byte lands in memory. The linker script says “put the exception vectors at 0x80000000, put .text right after, put .bss at 0x80040000,” and that is exactly what happens. No ASLR, no PIE, no loader surprises. (You miss this guarantee the first time you stop missing it.)
The objcopy pipeline. Here’s how a flat binary is produced from source:
# Step 1: Compile to object files
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -c -o startup.o startup.S
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -O1 -ffreestanding -nostdlib -c -o main.o main.c
# Step 2: Link into ELF (intermediate)
mipsel-linux-gnu-ld -T linker.ld -o firmware.elf startup.o main.o
# Step 3: Strip to flat binary -- THIS is the key step
mipsel-linux-gnu-objcopy -O binary firmware.elf firmware.binStep 3 is where everything disappears. objcopy -O binary takes the ELF, extracts only the loadable segments (the raw bytes that need to be in memory at runtime), and writes them sequentially to a file. Section headers? Gone. Symbol table? Gone. String table? Gone. Every piece of metadata that made the ELF navigable goes in the bin.
The developer still has the ELF – they need it to debug – but what ships to the customer is the flat binary. And when you are the one doing the reverse engineering, the flat binary is usually all you get.
Memory Layout: Where Everything Lives
Every flat firmware binary has a base address – the memory address where the bootloader copies it. For MIPS32 devices, this is typically in KSEG0 (0x80000000 - 0x9FFFFFFF), which is cached, unmapped kernel space. Our sample firmware uses 0x80000000:
Virtual Memory Map
==================
0x80000000 +---------------------------+
| Exception vectors (640 B) | Startup + exception handlers
0x80000280 +---------------------------+
| .text (5,392 bytes) | All firmware functions
0x80001790 +---------------------------+
| .rodata (2,396 bytes) | Strings, CLI tables, constants
0x800020EC +---------------------------+
| (end of flash content) |
...
0x80040000 +---------------------------+
| .bss (1,024 bytes) | Globals (zeroed at boot)
0x80040400 +---------------------------+
| (free RAM) |
...
0x80050000 +---------------------------+
| Stack top | Stack grows downward
+---------------------------+
MMIO Registers (not in binary):
0xB8000000 UART (TX, RX, status, control)
0xB8010000 GPIO (data, direction)
0xB8020000 Timer (count, compare, control)
0xB8030000 System (reset, clock, version)For this sample, the address translation is trivial: once you’ve extracted the raw payload, any byte at file offset N lives at virtual address 0x80000000 + N. No page tables, no ASLR, no segment mapping – one addition. If you find an interesting string at file offset 0x18CC, you know it sits at 0x800018CC in the running firmware.
Compare with ELF, where the loader reads program headers to figure out which file offset maps to which virtual address, with each segment at its own alignment. Here, you can do the math in your head.
Build Your Own: The Sample Firmware
Before reverse engineering bytes, it helps to watch where the bytes come from. The build is short, and once you’ve seen a flat binary fall out the end of objcopy, the rest of the series stops feeling like archaeology and starts feeling like accounting.
Our sample firmware simulates a small MIPS32 network gateway – think of an embedded router or bridge device managed over a UART serial console. This is a common pattern: millions of MIPS-based routers and switches run firmware exactly like this (Broadcom, MediaTek, and Qualcomm Atheros SoCs all use MIPS32 cores).
The firmware has:
- UART serial console with a command-line interface
- 8-port forwarding engine with route table management
- Public commands:
help,version,uptime,status,portdump,routedump,reset,loopback,memtest,log - Hidden commands:
peek,poke,flashid,crashme,regdump,dbglvl(more on these later) - MMIO register access for UART, GPIO, Timer, and System registers
- Assert/logging framework with source filename references
You’ll need the MIPS cross-compilation toolchain:
# Debian/Ubuntu
sudo apt install gcc-mipsel-linux-gnu binutils-mipsel-linux-gnuBuild and inspect:
$ make clean && make
rm -f *.o *.elf firmware.bin
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -fno-pic -mno-abicalls -c -o startup.o startup.S
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -O1 -std=gnu89 -fno-builtin -ffreestanding \
-nostdlib -mno-abicalls -fno-pic -G0 -mno-gpopt -Wall -c -o main.o main.c
mipsel-linux-gnu-ld -T linker.ld --no-warn-rwx-segments -o firmware.elf startup.o main.o
mipsel-linux-gnu-objcopy -O binary firmware.elf firmware.bin
========================================
firmware.bin: 8496 bytes
Format: Raw MIPS32 LE flat binary
Base: 0x80000000
========================================8,496 bytes of raw MIPS32 instructions and data. No headers, no symbols, no sections. Just bytes.
Try running file on it:
$ file firmware.bin
firmware.bin: data
$ file firmware.elf
firmware.elf: ELF 32-bit LSB executable, MIPS, MIPS32 rel2 version 1 (SYSV),
statically linked, not strippedfile does not even know what firmware.bin is. It just says “data.” That is what you are up against in firmware RE. Meanwhile, the intermediate ELF is recognized on sight – and that ELF is the one thing that never ships to customers.
The firmware has a UART interface that prints boot messages. With a serial cable, you would watch it initialize ports, set up routing tables, and present a gw> prompt. For this series, we don’t need the hardware – the binary is enough.
If you do want to see it run, there are two options. You can emulate it under QEMU with the Malta MIPS board (qemu-system-mipsel -M malta -kernel firmware.elf -serial stdio -nographic) – the build includes a QEMU-compatible UART variant, and using the ELF here is just a convenience for emulation and debugging. The shipped firmware artifact is still the flat firmware.bin payload inside the image. Or if you have real hardware, a Microchip PIC32MX470 Curiosity board (DM320103) has a MIPS32 M4K core with UART-over-USB and can run similar bare-metal firmware. Neither is required to follow along – everything in this series works purely through static analysis of the binary.
In Part 2, we’ll learn how to extract information from those bytes without ever running the firmware.
Firmware Is More Than One File
When you download a firmware update from a vendor, you don’t get a bare .bin. You get a firmware image – a container wrapping one or more payloads with headers, checksums, and a partition table. The flat binary is the payload we care about, but vendors rarely distribute it naked.
Our sample includes a packager that produces exactly this kind of image:
$ python3 fwpack.py firmware.bin firmware.img
Packed firmware image: firmware.img
Total size: 9152 bytes
Image CRC-32: 0x0FF01071
Partitions: 3
[bootloader]
Offset: 0x0120
Size: 32 bytes
CRC-32: 0x4F7A6ACA
[main_fw]
Offset: 0x0160
Size: 8496 bytes
CRC-32: 0x9F25E0DC
[config]
Offset: 0x22B0
Size: 256 bytes
CRC-32: 0x2683AC5BNotice the two magic strings (FWPK for the image header, FWND at the trailer), the per-partition CRCs, and the global image CRC – those are the three things every custom container format reinvents in some form.
The structure looks like this:
Firmware Image Layout
=====================
0x0000 +---------------------------+
| Image Header (256 bytes) | Magic "FWPK", version, CRC-32,
| | firmware name, build date, board ID,
| | partition count, entry point
0x0100 +---------------------------+
| Partition 0: Bootloader | 32B header + 32B payload
| (tiny MIPS stub) | Sets SP, jumps to main FW
0x0140 +---------------------------+
| Partition 1: Main FW | 32B header + 8,496B payload
| (our firmware.bin) | The actual firmware code
0x2290 +---------------------------+
| Partition 2: Config | 32B header + 256B payload
| (default settings) | Hostname, IP, routes, serial config
0x23B0 +---------------------------+
| Trailer (16 bytes) | End magic "FWND", size, CRC-32
0x23C0 +---------------------------+Each partition has its own CRC-32, and the image has a global CRC in the header. Two reasons this matters: the device’s bootloader verifies them before flashing (a corrupted update bricks the device), and if you ever binary-patch the firmware, you have to recompute every CRC the bootloader checks – or the device rejects your modified image and refuses to boot.
Every vendor invents their own container format. Some build on well-known structures (U-Boot’s uImage). Others ship custom headers with proprietary magic numbers, encryption layers, or RSA signature verification. Your first RE task is always defeating the container before you can read the code inside it.
We can verify our image and extract the partitions back out:
$ python3 fwpack.py --info firmware.img
Firmware Image: firmware.img
Magic: FWPK
Version: 1.0.0.0
Total size: 9152 bytes (0x23c0)
Partitions: 3
Entry point: 0x80000000
Image CRC: 0x0FF01071
FW name: NetGW-MIPS32 v2.4.1-rc3
Build date: Apr 2026
Board ID: MIPS32-GW-DEV
CRC verified OK
Partition Table:
# Type Offset Size CRC-32 Load Addr
--------------------------------------------------------------------
0 bootloader 0x0120 32 0x4F7A6ACA 0x80000000 [OK]
1 main_fw 0x0160 8496 0x9F25E0DC 0x80000000 [OK]
2 config 0x22B0 256 0x2683AC5B 0x80000000 [OK]Remember: ELF is a standard. Documented, parsed by every tool you own. Here? Custom format, custom magic, custom checksums. file won’t help. readelf won’t help. You and a hex editor, on a Friday night.
Debug Symbols: The Cheat Code
Before we dive deeper into the raw binary (that’s Part 2), one more comparison to make the stakes concrete. Same function – firmware_main, the entry point of our gateway – once without symbols and once with.
Without symbols (the flat binary, as shipped to customers):
$ mipsel-linux-gnu-objdump -D -b binary -m mips firmware.bin
...
16b0: e8ffbd27 addiu sp,sp,-24
16b4: 1400bfaf sw ra,20(sp)
16b8: 1000b0af sw s0,16(sp)
16bc: 0080053c lui a1,0x8000
16c0: e01ea524 addiu a1,a1,7904
16c4: dd02000c jal 0xb74
16c8: 01000424 li a0,1
...
16f0: f502000c jal 0xbd4
16f4: 00000000 nop
16f8: ad03000c jal 0xeb4
16fc: 00000000 nop
1700: 03000624 li a2,3
1704: 01000524 li a1,1
1708: fb03000c jal 0xfec
170c: 010a0424 li a0,2561
...
1728: fb03000c jal 0xfec
172c: a8c00434 li a0,0xc0a8
...
1784: 7205000c jal 0x15c8What can you read here? “Call 0xb74. Call 0xbd4. Call 0xeb4. Call 0xfec four times with different arguments. Call 0x15c8.” That is the entirety of the information. Just addresses. You have no idea what any of those functions do, or whether two of them do the same thing.
With symbols (the ELF, kept for debugging):
$ mipsel-linux-gnu-objdump -d firmware.elf
...
800016b0 <firmware_main>:
800016b0: 27bdffe8 addiu sp,sp,-24
800016b4: afbf0014 sw ra,20(sp)
800016b8: afb00010 sw s0,16(sp)
800016bc: 3c058000 lui a1,0x8000
800016c0: 24a51ee0 addiu a1,a1,7904
800016c4: 0c0002dd jal 80000b74 <log_msg>
800016c8: 24040001 li a0,1
...
800016f0: 0c0002f5 jal 80000bd4 <port_init>
800016f4: 00000000 nop
800016f8: 0c0003ad jal 80000eb4 <route_init>
800016fc: 00000000 nop
80001700: 24060003 li a2,3
80001704: 24050001 li a1,1
80001708: 0c0003fb jal 80000fec <route_add>
8000170c: 24040a01 li a0,2561
...
80001728: 0c0003fb jal 80000fec <route_add>
8000172c: 3404c0a8 li a0,0xc0a8
...
80001784: 0c000572 jal 800015c8 <cli_main_loop>Now you can read the whole boot sequence: log “system starting”, initialize ports, initialize the route table, add four routes (10.1.x.x, 10.2.x.x, 192.168.x.x, 172.16.x.x), log “system ready”, print banner, enter the CLI main loop.
Same machine code, same addresses, same behavior. The only difference is that the ELF carries a symbol table mapping addresses to names:
$ mipsel-linux-gnu-nm firmware.elf | head -20
80000000 T _start
80000248 T _exception_handler
800002e4 T uart_puts
80000b74 T log_msg
80000bd4 T port_init
80000eb4 T route_init
80000efc T iterate_active_routes
80000fec T route_add
800013d0 T cli_process_command
800015c8 T cli_main_loop
800016b0 T firmware_main
...93 symbols in our toy firmware. The vendor firmware we worked on had over 4,400 functions across 1.5 MB of code. Picture 4,400 nameless functions – just addresses and raw MIPS instructions, with no obvious starting point and no obvious end. That is why recovering function names and code structure takes weeks of manual work.
Symbols turn weeks into hours. Vendor firmware almost never ships with them. The rest of this series is about getting them back.
What We’re Up Against
Let me summarize where we stand. We have:
- A flat binary – raw MIPS32 instructions with no headers, no sections, no symbols
- A firmware image container – custom format with magic numbers, CRC checksums, and multiple partitions
- A trivial address mapping – file offset + base address = virtual address
- 109 embedded strings – the main human-readable foothold in this sample binary
And we know what we’re missing:
- No function boundaries (where does one function end and another begin?)
- No function names (what does the code at
0xbd4do?) - No variable names (what’s stored at
0x80040340?) - No type information (is that 32-bit value an integer, a pointer, or a bitfield?)
- No cross-references (who calls this function? Where is this string used?)
The tools from our ELF toolkit – readelf, nm, objdump -tT – are useless against a flat binary. We need a different approach.
Here’s what we’re going to do across the rest of the series:
Part 2: “Opcodes, Prologues, and other hidden patterns” – We’ll learn to read MIPS32 assembly just enough to find function boundaries, trace calls, and discover a hidden CLI with commands that don’t appear in any help output. Our entry point? Those 109 strings.
Part 3: “Decompilers, Annotations, and other ways to read the unreadable” – We’ll throw the binary at Ghidra to decompile all 8,496 bytes into readable C, then follow the execution path through the route engine to discover a priority inversion bug that silently drops traffic. Then we’ll binary-patch the fix – and recalculate the CRC.
Part 4: “Symbols, Scripts, and other linking nightmares” – We’ll try to go from decompiled C back to a working binary. We’ll discover why that’s far harder than it sounds, build custom linker scripts, and learn why firmware linking is fundamentally different from anything in userspace.
Part 5: “Versions, Callgraphs, and other ways to compare what changed” – We’ll compare two firmware versions side by side, decompile the same functions from each, and use diffs and callgraph patterns to see how a vendor fix evolves between releases.
Every technique here comes from a real project. Every pattern is something we hit in production firmware. The sample recreates those patterns so you can follow along hands-on, without anyone’s lawyer getting involved.
So far we’ve established what isn’t in the binary. In Part 2, we start with the only thing that is readable – the strings – and from there pull function boundaries, a call graph, and a hidden CLI out of raw bytes.
References
- ELF’s Linker’s and other magical creatures – The 2018 predecessor to this series
- MIPS32 Architecture For Programmers – MIPS instruction set reference
- GNU binutils documentation –
objcopy,objdump,readelf,nm - Ghidra – NSA’s reverse engineering framework
- Capstone Engine – Lightweight disassembly framework
- MIPS Calling Conventions – Register usage and stack frame layout