Eight years ago I wrote ELF’s Linker’s and other magical creatures – a walkthrough of the ELF binary format, relocations, segments, and even live code injection through /proc/pid/mem. That post ended in a comfortable place: .text, .data, .bss; ld resolving symbols; the kernel’s ELF loader mapping segments into memory; gdb poking at a running process. The civilized world of userspace binaries.

Then someone hands you a 1.5 MB file. No extension, no headers, no documentation. file says “data.” readelf says “not an ELF file.” objdump refuses to open it. Welcome to firmware reverse engineering.

This is Part 1 of a five-part series that picks up where the ELF post left off, but for bare-metal firmware on MIPS32 embedded devices. Everything you learned about ELF structure, symbol tables, and linking is still useful – mostly as a reference for what isn’t in front of you anymore.

The techniques come from a real project analyzing production firmware on an embedded network device. I can’t name the vendor (lawyers), but every pattern, every tool, every trick here is something we actually used in anger. To keep it reproducible, we’ll build a sample MIPS32 firmware – a small network gateway with a UART serial console – that recreates the same structures.

Bonus – the sample firmware bundled: if you’d rather skip the build and start poking at bytes, the same sources, the resulting firmware.bin and firmware.elf, the multi-partition image, and the small tools/ directory used in Part 2 are all bundled in sample_firmware.tar.gz. Grab it and follow along on your own copy:

curl -L -O https://res.cloudinary.com/gotocco/raw/upload/v1777182332/sample_firmware.tar.gz
tar xf sample_firmware.tar.gz
cd sample_firmware

What’s Missing: ELF vs Flat Binary

The quickest way to understand flat firmware is to compare it with what you already know. Here’s what a standard ELF binary looks like at byte zero:

$ xxd -l 64 firmware.elf
00000000: 7f45 4c46 0101 0100 0500 0000 0000 0000  .ELF............
00000010: 0200 0800 0100 0000 0000 0080 3400 0000  ............4...
00000020: a431 0100 0110 0070 3400 2000 0500 2800  .1.....p4. ...(.
00000030: 0d00 0c00 0300 0070 0821 0100 0821 0080  .......p.!...!..

You can spot the 7F 45 4C 46 magic immediately – .ELF. After that comes the class (32-bit), endianness (little), the machine type (MIPS), the entry point, program header offset, section header offset. Everything the kernel’s loader needs to map this binary into memory.

Now here’s what a firmware binary looks like at byte zero:

$ xxd -l 64 firmware.bin
00000000: 0580 1d3c 0000 bd37 0480 1c3c 0000 9c37  ...<...7...<...7
00000010: 8000 0008 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................

No magic bytes. No headers. The very first byte of this file is a machine instruction. If you happen to read MIPS, you can decode it on sight: lui sp, 0x8005 – loading the upper half of the stack pointer, the canonical first move of a freshly-booted CPU. Once the boot ROM (or first-stage bootloader) copies the payload to RAM and jumps to its entry address, this is where execution begins. (Register conventions come in Part 2. For now, sp, ra, a0..a3, and s0..s7 are the names you’ll keep seeing.)

Here’s what each format gives you:

Feature ELF Flat firmware
Magic bytes 7F 45 4C 46 None (first instruction)
Section headers .text, .data, .bss, … No
Symbol table Function names, variables No
Relocation entries Linker fixups No
Entry point e_entry field Byte 0 (or known fixed address)
Load method Kernel parses headers, maps segments Bootloader copies to RAM, jumps

Remember readelf -S showing section headers? nm listing every function by name? objdump -tT dumping the symbol table? None of that works here. The flat binary has no metadata. The hex dump is the documentation.


Why Firmware Is Flat

This isn’t an accident or a limitation – it’s a deliberate engineering choice. Here’s why embedded firmware ships as raw binary blobs:

Bootloader simplicity. The first code that runs when a MIPS chip powers on is a tiny bootrom, often a few hundred bytes burned into mask ROM. It does not have an ELF parser. It loads a fixed payload from flash into RAM and jumps to a known entry address – in our sample, 0x80000000. Any complexity in the binary format becomes complexity in silicon, and silicon has to be right on the first wafer.

Speed. A router that takes 30 seconds to come up after a power cycle is a router nobody deploys. Flat binaries boot in one memcpy – no parsing, no relocations, no dynamic linking. Copy the bytes, jump to the entry point, you’re running.

Deterministic layout. The developer controls exactly where every byte lands in memory. The linker script says “put the exception vectors at 0x80000000, put .text right after, put .bss at 0x80040000,” and that is exactly what happens. No ASLR, no PIE, no loader surprises. (You miss this guarantee the first time you stop missing it.)

The objcopy pipeline. Here’s how a flat binary is produced from source:

# Step 1: Compile to object files
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -c -o startup.o startup.S
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -O1 -ffreestanding -nostdlib -c -o main.o main.c

# Step 2: Link into ELF (intermediate)
mipsel-linux-gnu-ld -T linker.ld -o firmware.elf startup.o main.o

# Step 3: Strip to flat binary -- THIS is the key step
mipsel-linux-gnu-objcopy -O binary firmware.elf firmware.bin

Step 3 is where everything disappears. objcopy -O binary takes the ELF, extracts only the loadable segments (the raw bytes that need to be in memory at runtime), and writes them sequentially to a file. Section headers? Gone. Symbol table? Gone. String table? Gone. Every piece of metadata that made the ELF navigable goes in the bin.

The developer still has the ELF – they need it to debug – but what ships to the customer is the flat binary. And when you are the one doing the reverse engineering, the flat binary is usually all you get.


Memory Layout: Where Everything Lives

Every flat firmware binary has a base address – the memory address where the bootloader copies it. For MIPS32 devices, this is typically in KSEG0 (0x80000000 - 0x9FFFFFFF), which is cached, unmapped kernel space. Our sample firmware uses 0x80000000:

Virtual Memory Map
==================

0x80000000 +---------------------------+
           | Exception vectors (640 B) |  Startup + exception handlers
0x80000280 +---------------------------+
           | .text (5,392 bytes)       |  All firmware functions
0x80001790 +---------------------------+
           | .rodata (2,396 bytes)     |  Strings, CLI tables, constants
0x800020EC +---------------------------+
           | (end of flash content)    |
           ...
0x80040000 +---------------------------+
           | .bss (1,024 bytes)        |  Globals (zeroed at boot)
0x80040400 +---------------------------+
           | (free RAM)               |
           ...
0x80050000 +---------------------------+
           | Stack top                 |  Stack grows downward
           +---------------------------+

MMIO Registers (not in binary):
  0xB8000000  UART (TX, RX, status, control)
  0xB8010000  GPIO (data, direction)
  0xB8020000  Timer (count, compare, control)
  0xB8030000  System (reset, clock, version)

For this sample, the address translation is trivial: once you’ve extracted the raw payload, any byte at file offset N lives at virtual address 0x80000000 + N. No page tables, no ASLR, no segment mapping – one addition. If you find an interesting string at file offset 0x18CC, you know it sits at 0x800018CC in the running firmware.

Compare with ELF, where the loader reads program headers to figure out which file offset maps to which virtual address, with each segment at its own alignment. Here, you can do the math in your head.


Build Your Own: The Sample Firmware

Before reverse engineering bytes, it helps to watch where the bytes come from. The build is short, and once you’ve seen a flat binary fall out the end of objcopy, the rest of the series stops feeling like archaeology and starts feeling like accounting.

Our sample firmware simulates a small MIPS32 network gateway – think of an embedded router or bridge device managed over a UART serial console. This is a common pattern: millions of MIPS-based routers and switches run firmware exactly like this (Broadcom, MediaTek, and Qualcomm Atheros SoCs all use MIPS32 cores).

The firmware has:

  • UART serial console with a command-line interface
  • 8-port forwarding engine with route table management
  • Public commands: help, version, uptime, status, portdump, routedump, reset, loopback, memtest, log
  • Hidden commands: peek, poke, flashid, crashme, regdump, dbglvl (more on these later)
  • MMIO register access for UART, GPIO, Timer, and System registers
  • Assert/logging framework with source filename references

You’ll need the MIPS cross-compilation toolchain:

# Debian/Ubuntu
sudo apt install gcc-mipsel-linux-gnu binutils-mipsel-linux-gnu

Build and inspect:

$ make clean && make
rm -f *.o *.elf firmware.bin
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -fno-pic -mno-abicalls -c -o startup.o startup.S
mipsel-linux-gnu-gcc-14 -mips32r2 -EL -O1 -std=gnu89 -fno-builtin -ffreestanding \
    -nostdlib -mno-abicalls -fno-pic -G0 -mno-gpopt -Wall -c -o main.o main.c
mipsel-linux-gnu-ld -T linker.ld --no-warn-rwx-segments -o firmware.elf startup.o main.o
mipsel-linux-gnu-objcopy -O binary firmware.elf firmware.bin

========================================
  firmware.bin: 8496 bytes
  Format: Raw MIPS32 LE flat binary
  Base:   0x80000000
========================================

8,496 bytes of raw MIPS32 instructions and data. No headers, no symbols, no sections. Just bytes.

Try running file on it:

$ file firmware.bin
firmware.bin: data

$ file firmware.elf
firmware.elf: ELF 32-bit LSB executable, MIPS, MIPS32 rel2 version 1 (SYSV),
              statically linked, not stripped

file does not even know what firmware.bin is. It just says “data.” That is what you are up against in firmware RE. Meanwhile, the intermediate ELF is recognized on sight – and that ELF is the one thing that never ships to customers.

The firmware has a UART interface that prints boot messages. With a serial cable, you would watch it initialize ports, set up routing tables, and present a gw> prompt. For this series, we don’t need the hardware – the binary is enough.

If you do want to see it run, there are two options. You can emulate it under QEMU with the Malta MIPS board (qemu-system-mipsel -M malta -kernel firmware.elf -serial stdio -nographic) – the build includes a QEMU-compatible UART variant, and using the ELF here is just a convenience for emulation and debugging. The shipped firmware artifact is still the flat firmware.bin payload inside the image. Or if you have real hardware, a Microchip PIC32MX470 Curiosity board (DM320103) has a MIPS32 M4K core with UART-over-USB and can run similar bare-metal firmware. Neither is required to follow along – everything in this series works purely through static analysis of the binary.

In Part 2, we’ll learn how to extract information from those bytes without ever running the firmware.


Firmware Is More Than One File

When you download a firmware update from a vendor, you don’t get a bare .bin. You get a firmware image – a container wrapping one or more payloads with headers, checksums, and a partition table. The flat binary is the payload we care about, but vendors rarely distribute it naked.

Our sample includes a packager that produces exactly this kind of image:

$ python3 fwpack.py firmware.bin firmware.img

Packed firmware image: firmware.img
  Total size:    9152 bytes
  Image CRC-32:  0x0FF01071
  Partitions:    3

  [bootloader]
    Offset:  0x0120
    Size:    32 bytes
    CRC-32:  0x4F7A6ACA
  [main_fw]
    Offset:  0x0160
    Size:    8496 bytes
    CRC-32:  0x9F25E0DC
  [config]
    Offset:  0x22B0
    Size:    256 bytes
    CRC-32:  0x2683AC5B

Notice the two magic strings (FWPK for the image header, FWND at the trailer), the per-partition CRCs, and the global image CRC – those are the three things every custom container format reinvents in some form.

The structure looks like this:

Firmware Image Layout
=====================

0x0000 +---------------------------+
       | Image Header (256 bytes)  |  Magic "FWPK", version, CRC-32,
       |                           |  firmware name, build date, board ID,
       |                           |  partition count, entry point
0x0100 +---------------------------+
       | Partition 0: Bootloader   |  32B header + 32B payload
       |  (tiny MIPS stub)         |  Sets SP, jumps to main FW
0x0140 +---------------------------+
       | Partition 1: Main FW      |  32B header + 8,496B payload
       |  (our firmware.bin)       |  The actual firmware code
0x2290 +---------------------------+
       | Partition 2: Config       |  32B header + 256B payload
       |  (default settings)       |  Hostname, IP, routes, serial config
0x23B0 +---------------------------+
       | Trailer (16 bytes)        |  End magic "FWND", size, CRC-32
0x23C0 +---------------------------+

Each partition has its own CRC-32, and the image has a global CRC in the header. Two reasons this matters: the device’s bootloader verifies them before flashing (a corrupted update bricks the device), and if you ever binary-patch the firmware, you have to recompute every CRC the bootloader checks – or the device rejects your modified image and refuses to boot.

Every vendor invents their own container format. Some build on well-known structures (U-Boot’s uImage). Others ship custom headers with proprietary magic numbers, encryption layers, or RSA signature verification. Your first RE task is always defeating the container before you can read the code inside it.

We can verify our image and extract the partitions back out:

$ python3 fwpack.py --info firmware.img

Firmware Image: firmware.img
  Magic:       FWPK
  Version:     1.0.0.0
  Total size:  9152 bytes (0x23c0)
  Partitions:  3
  Entry point: 0x80000000
  Image CRC:   0x0FF01071
  FW name:     NetGW-MIPS32 v2.4.1-rc3
  Build date:  Apr 2026
  Board ID:    MIPS32-GW-DEV

  CRC verified OK

Partition Table:
  #  Type          Offset     Size       CRC-32      Load Addr
  --------------------------------------------------------------------
  0  bootloader    0x0120         32     0x4F7A6ACA  0x80000000  [OK]
  1  main_fw       0x0160       8496     0x9F25E0DC  0x80000000  [OK]
  2  config        0x22B0        256     0x2683AC5B  0x80000000  [OK]

Remember: ELF is a standard. Documented, parsed by every tool you own. Here? Custom format, custom magic, custom checksums. file won’t help. readelf won’t help. You and a hex editor, on a Friday night.


Debug Symbols: The Cheat Code

Before we dive deeper into the raw binary (that’s Part 2), one more comparison to make the stakes concrete. Same function – firmware_main, the entry point of our gateway – once without symbols and once with.

Without symbols (the flat binary, as shipped to customers):

$ mipsel-linux-gnu-objdump -D -b binary -m mips firmware.bin
...
    16b0:   e8ffbd27    addiu   sp,sp,-24
    16b4:   1400bfaf    sw      ra,20(sp)
    16b8:   1000b0af    sw      s0,16(sp)
    16bc:   0080053c    lui     a1,0x8000
    16c0:   e01ea524    addiu   a1,a1,7904
    16c4:   dd02000c    jal     0xb74
    16c8:   01000424    li      a0,1
    ...
    16f0:   f502000c    jal     0xbd4
    16f4:   00000000    nop
    16f8:   ad03000c    jal     0xeb4
    16fc:   00000000    nop
    1700:   03000624    li      a2,3
    1704:   01000524    li      a1,1
    1708:   fb03000c    jal     0xfec
    170c:   010a0424    li      a0,2561
    ...
    1728:   fb03000c    jal     0xfec
    172c:   a8c00434    li      a0,0xc0a8
    ...
    1784:   7205000c    jal     0x15c8

What can you read here? “Call 0xb74. Call 0xbd4. Call 0xeb4. Call 0xfec four times with different arguments. Call 0x15c8.” That is the entirety of the information. Just addresses. You have no idea what any of those functions do, or whether two of them do the same thing.

With symbols (the ELF, kept for debugging):

$ mipsel-linux-gnu-objdump -d firmware.elf
...
800016b0 <firmware_main>:
800016b0:   27bdffe8    addiu   sp,sp,-24
800016b4:   afbf0014    sw      ra,20(sp)
800016b8:   afb00010    sw      s0,16(sp)
800016bc:   3c058000    lui     a1,0x8000
800016c0:   24a51ee0    addiu   a1,a1,7904
800016c4:   0c0002dd    jal     80000b74 <log_msg>
800016c8:   24040001    li      a0,1
...
800016f0:   0c0002f5    jal     80000bd4 <port_init>
800016f4:   00000000    nop
800016f8:   0c0003ad    jal     80000eb4 <route_init>
800016fc:   00000000    nop
80001700:   24060003    li      a2,3
80001704:   24050001    li      a1,1
80001708:   0c0003fb    jal     80000fec <route_add>
8000170c:   24040a01    li      a0,2561
...
80001728:   0c0003fb    jal     80000fec <route_add>
8000172c:   3404c0a8    li      a0,0xc0a8
...
80001784:   0c000572    jal     800015c8 <cli_main_loop>

Now you can read the whole boot sequence: log “system starting”, initialize ports, initialize the route table, add four routes (10.1.x.x, 10.2.x.x, 192.168.x.x, 172.16.x.x), log “system ready”, print banner, enter the CLI main loop.

Same machine code, same addresses, same behavior. The only difference is that the ELF carries a symbol table mapping addresses to names:

$ mipsel-linux-gnu-nm firmware.elf | head -20
80000000 T _start
80000248 T _exception_handler
800002e4 T uart_puts
80000b74 T log_msg
80000bd4 T port_init
80000eb4 T route_init
80000efc T iterate_active_routes
80000fec T route_add
800013d0 T cli_process_command
800015c8 T cli_main_loop
800016b0 T firmware_main
...

93 symbols in our toy firmware. The vendor firmware we worked on had over 4,400 functions across 1.5 MB of code. Picture 4,400 nameless functions – just addresses and raw MIPS instructions, with no obvious starting point and no obvious end. That is why recovering function names and code structure takes weeks of manual work.

Symbols turn weeks into hours. Vendor firmware almost never ships with them. The rest of this series is about getting them back.


What We’re Up Against

Let me summarize where we stand. We have:

  • A flat binary – raw MIPS32 instructions with no headers, no sections, no symbols
  • A firmware image container – custom format with magic numbers, CRC checksums, and multiple partitions
  • A trivial address mapping – file offset + base address = virtual address
  • 109 embedded strings – the main human-readable foothold in this sample binary

And we know what we’re missing:

  • No function boundaries (where does one function end and another begin?)
  • No function names (what does the code at 0xbd4 do?)
  • No variable names (what’s stored at 0x80040340?)
  • No type information (is that 32-bit value an integer, a pointer, or a bitfield?)
  • No cross-references (who calls this function? Where is this string used?)

The tools from our ELF toolkit – readelf, nm, objdump -tT – are useless against a flat binary. We need a different approach.

Here’s what we’re going to do across the rest of the series:

Part 2: “Opcodes, Prologues, and other hidden patterns” – We’ll learn to read MIPS32 assembly just enough to find function boundaries, trace calls, and discover a hidden CLI with commands that don’t appear in any help output. Our entry point? Those 109 strings.

Part 3: “Decompilers, Annotations, and other ways to read the unreadable” – We’ll throw the binary at Ghidra to decompile all 8,496 bytes into readable C, then follow the execution path through the route engine to discover a priority inversion bug that silently drops traffic. Then we’ll binary-patch the fix – and recalculate the CRC.

Part 4: “Symbols, Scripts, and other linking nightmares” – We’ll try to go from decompiled C back to a working binary. We’ll discover why that’s far harder than it sounds, build custom linker scripts, and learn why firmware linking is fundamentally different from anything in userspace.

Part 5: “Versions, Callgraphs, and other ways to compare what changed” – We’ll compare two firmware versions side by side, decompile the same functions from each, and use diffs and callgraph patterns to see how a vendor fix evolves between releases.

Every technique here comes from a real project. Every pattern is something we hit in production firmware. The sample recreates those patterns so you can follow along hands-on, without anyone’s lawyer getting involved.

So far we’ve established what isn’t in the binary. In Part 2, we start with the only thing that is readable – the strings – and from there pull function boundaries, a call graph, and a hidden CLI out of raw bytes.


References