In Part 1, we built a flat MIPS32 firmware binary and stared at 8,496 bytes with no headers, no symbols, and no sections. file said “data.” readelf said nothing. We left off with a question: now what?

The temptation is to fire up a disassembler and start reading opcodes. Resist it. The first useful thing you can do with an unknown firmware binary isn’t disassembly – it’s reading the text that the developers left behind.

Code and binary: the firmware.bin analyzed below, the build sources, and every script in this post are bundled in sample_firmware.tar.gz. Grab it and follow along on your own copy:

curl -L -O https://res.cloudinary.com/gotocco/raw/upload/v1777182332/sample_firmware.tar.gz
tar xf sample_firmware.tar.gz
cd sample_firmware

Strings: The First Foothold

Every firmware binary contains embedded strings – log messages, error texts, command prompts, version information. The developers who wrote the firmware needed to print things to the serial console, log diagnostic messages, and display help text. All of that text lives in the .rodata section, which survives the objcopy to flat binary untouched.

$ strings firmware.bin

111 strings come pouring out. Let’s categorize what we find:

Boot sequence messages:

system starting
initializing ports
initializing route table
WARNING: no routes programmed for stack
bridge routes deferred to bridge handler
system ready

These tell us the firmware has a boot sequence: it starts up, initializes a port subsystem, sets up a routing table, and then something goes wrong with route programming before declaring “system ready.” That WARNING is interesting – we’ll come back to it.

CLI interface:

gw>
Available commands:
Unknown command. Type 'help' for list.

There’s a command-line interface with a gw> prompt. This is a gateway device.

Command names:

help                    version                 uptime
status                  portdump                routedump
reset                   loopback                memtest
log

And then, further down in the binary, a second cluster:

dbglvl                  peek                    poke
flashid                 crashme                 regdump

Re-running strings with -t x to see file offsets makes the layout visible:

$ strings -t x firmware.bin | grep -E "^\s+[0-9a-f]+ (help|version|uptime|status|portdump|routedump|reset|loopback|memtest|log|dbglvl|peek|poke|flashid|crashme|regdump)$"
   17bc memtest
   17e0 loopback
   1800 reset
   181c routedump
   1844 portdump
   1864 status
   1880 uptime
   18a0 version
   18c0 help
   1958 regdump
   197c crashme
   199c flashid
   19bc poke
   19d8 peek
   19fc dbglvl

The two clusters sit in different regions of .rodata: the familiar-looking names are packed contiguously between 0x17bc and 0x18c0, the rest live ~0x80 bytes further on between 0x1958 and 0x19fc, and the two groups never interleave. That’s not how a compiler lays out a single array. (log is missing from this grep because strings defaults to a minimum length of 4 characters and log is three; pass -n 3 to see it – it sits at 0x17a8, inside the first cluster.)

Wait. peek? poke? crashme? Those don’t sound like commands you’d put in a user manual.

We don’t yet know which of these are reachable from the user-facing CLI and which aren’t – the layout is suggestive, but suggestion isn’t proof. For now we have 16 command-shaped strings split across two clusters. Keep that in mind.

Error messages and source filenames:

ERROR: invalid port number (0-7)
ERROR: routing table full
ASSERT FAILED
main.c    cli.c    port.c    route.c    diag.c    flash.c

The firmware was built from six source files. The assert macro includes the filename – a gift for reverse engineers, because now we know the firmware’s module structure without reading a single instruction.

Firmware identity:

NetGW-MIPS32
v2.4.1-rc3
Apr 2026
(c) Embedded Network Systems

In under a minute, using nothing but strings, we now know:

  • This is a network gateway device (NetGW-MIPS32, prompt gw>)
  • It manages ports and routes with a forwarding table
  • It has a CLI with at least 16 command-shaped strings, in two physically separated clusters (we don’t yet know which are user-visible)
  • It was built from 6 source modules (main, cli, port, route, diag, flash)
  • Something might be wrong with route programming during boot
  • The second cluster has names like peek, poke, crashme – shapes of developer commands, not user ones

All from one command. No disassembly required. In the real vendor firmware project this series is based on, strings was our very first tool – and it revealed over 4,400 embedded strings that mapped directly to every subsystem in a 1.5 MB binary.


MIPS32 Survival Guide

You don’t need to learn MIPS assembly to follow this series. You need exactly five patterns. If you can spot these five things in a disassembly listing, you can trace function calls, find function boundaries, and follow data references across the entire binary.

1. Function prologue – “I’m starting a function”

addiu   sp, sp, -24       # Allocate 24 bytes of stack space
sw      ra, 20(sp)        # Save return address
sw      s0, 16(sp)        # Save callee-saved register

Every function begins by growing the stack downward. The size tells you how many local variables and saved registers it needs. -24 is a small function. -184 is a big one with lots of locals.

2. Function epilogue – “I’m returning to my caller”

lw      ra, 20(sp)        # Restore return address
lw      s0, 16(sp)        # Restore saved register
jr      ra                # Jump to return address
addiu   sp, sp, 24        # Deallocate stack (delay slot!)

Note the addiu sp, sp, 24 after jr ra. On MIPS, the instruction after a jump always executes (the branch delay slot). The stack cleanup happens “while” the CPU is jumping back.

3. Function call – “Call another function”

jal     80000b74          # Jump And Link — call function at 0x80000b74
                          # Sets ra = address of instruction after delay slot

JAL is the MIPS “call” instruction. It saves the return address in register $ra and jumps to the target. Every jal in the binary is a function call.

4. Load a 32-bit address – “Point at something”

lui     a1, 0x8000        # Load Upper Immediate: a1 = 0x80000000
addiu   a1, a1, 0x20e0    # Add lower 16 bits:    a1 = 0x800020e0

MIPS instructions are 32 bits wide, so you can’t load a 32-bit address in one instruction. The compiler uses a LUI/ADDIU pair: LUI loads the upper 16 bits, ADDIU adds the lower 16. This two-instruction pattern is how every string reference, every global variable access, and every table pointer gets loaded.

5. Register convention – “Who’s who”

Register Name Purpose
$a0-$a3 Arguments First 4 function arguments
$v0-$v1 Values Return values
$ra Return Address Where to return after jr ra
$sp Stack Pointer Current stack top
$s0-$s7 Saved Callee-saved (preserved across calls)
$t0-$t9 Temporary Caller-saved (may be clobbered by calls)

That’s it. With these five patterns you can read any MIPS disassembly listing well enough to trace program flow.


From Strings to Code: Finding Where Strings Live

We found 111 strings. Now we need to connect them to the code that uses them. The question: which function prints “system starting”?

One convention before we go further. The flat binary starts at file offset 0 but the firmware was linked to run at 0x80000000 (we set that base in the Part 1 linker script – it’s KSEG0, the canonical bootable region on MIPS32). So runtime address = file offset + 0x80000000, and that’s the only translation we’ll need for the rest of the post. When we say “the string at 0x800020E0,” that’s the same byte as “file offset 0x20E0.”

Here’s the key insight. When the compiler generates code that references a string, it emits a LUI/ADDIU pair to load the string’s address into a register. If we know the string’s address in the binary, we can search for the LUI/ADDIU pair that loads it.

Let’s trace it manually. The string “system starting” is at file offset 0x20E0 in our binary:

$ strings -t x firmware.bin | grep "system starting"
   20e0 system starting

Since our binary is loaded at 0x80000000, the runtime address of this string is 0x800020E0. To load this address, the compiler emits:

lui     a1, 0x8000        # a1 = 0x80000000  (upper 16 bits)
addiu   a1, a1, 0x20e0    # a1 = 0x800020E0  (add lower 16 bits)

We can automate this directly against the instruction encoding – LUI is opcode 0x0F (top six bits) and ADDIU is opcode 0x09, so each instruction is a 32-bit word with a known shape, and pattern-matching the bytes is enough to find every pair:

import struct

BASE = 0x80000000

def find_string_xrefs(data, string_addr):
    """Find all LUI/ADDIU pairs that load a given address."""
    hi16 = (string_addr >> 16) & 0xFFFF
    lo16 = string_addr & 0xFFFF

    # Sign extension: if lo16 >= 0x8000, LUI loads hi16+1
    if lo16 >= 0x8000:
        lui_imm = (hi16 + 1) & 0xFFFF
    else:
        lui_imm = hi16

    xrefs = []
    for reg in range(1, 32):  # search all registers
        lui_word = 0x3C000000 | (reg << 16) | lui_imm
        lui_bytes = struct.pack('<I', lui_word)

        pos = 0
        while True:
            pos = data.find(lui_bytes, pos)
            if pos == -1:
                break
            # Search forward for matching ADDIU within 64 instructions
            addiu_word = 0x24000000 | (reg << 21) | (reg << 16) | (lo16 & 0xFFFF)
            addiu_bytes = struct.pack('<I', addiu_word)
            addiu_pos = data.find(addiu_bytes, pos, pos + 256)
            if addiu_pos != -1:
                xrefs.append((BASE + pos, BASE + addiu_pos))
            pos += 4

    return xrefs

The sign-extension detail matters more than it looks: MIPS ADDIU sign-extends its 16-bit immediate. If the lower half of the address is >= 0x8000, the LUI has to load one more than the upper half to compensate. Get this wrong and you’ll miss half your cross-references. (In the real project, getting this right was the difference between finding 2,000 strings and finding all 4,471.)

The runnable script in the bundle (tools/find_string_xrefs.py) takes the same approach but uses Capstone to decode instructions instead of matching byte patterns – mostly so it can track register state across the LUI/ADDIU window and reject pairs where an intervening write has invalidated the upper half. The companion scripts tools/find_word.py, tools/find_function_prologues.py, and tools/find_cli_tables.py cover the data-pointer scan and the prologue/table walks we’ll use in §5 and §6. (apt install python3-capstone on Debian-family systems, or pip install capstone inside a venv elsewhere.)


Finding Function Boundaries

Now that we can locate where a string is referenced, we need to find which function contains that reference. The answer: scan backward for a function prologue.

def find_function_prologues(data):
    """Find all ADDIU SP, SP, -N instructions (function starts)."""
    prologues = []
    for off in range(0, len(data) - 3, 4):
        word = struct.unpack_from('<I', data, off)[0]
        if (word >> 16) == 0x27BD:           # ADDIU $sp, $sp, imm
            imm = word & 0xFFFF
            if imm >= 0x8000:                # negative = stack allocation
                frame_size = imm - 0x10000
                prologues.append((BASE + off, frame_size))
    return prologues

Running this on our firmware.bin:

Found 35 function prologues:
  0x80000248: addiu sp, sp, -32
  0x800002F0: addiu sp, sp, -32
  0x80000354: addiu sp, sp, -40
  ...
  0x80000EFC: addiu sp, sp, -48    ← iterate_active_routes?
  ...
  0x800015C8: addiu sp, sp, -184   ← big function (CLI main loop?)
  0x800016B0: addiu sp, sp, -24    ← last function (firmware_main?)

35 function prologues. The real count is higher – leaf functions that never call anything else don’t need to save $ra and often skip the standard prologue entirely, so they slip past this scan. But 35 is enough to map the major functions, and that’s what we need to follow the strings.

We can also count JAL calls to build a call graph:

JAL call frequency:
  0x800002E4: called 111 times   ← most-called function (uart_puts?)
  0x80000530: called  15 times   ← probably uart_puthex
  0x80000790: called  15 times   ← probably uart_putdec
  0x80000B74: called  12 times   ← log_msg (used in every subsystem)
  0x800002C4: called   6 times   ← uart_putchar
  0x80000AA8: called   4 times   ← fw_assert
  0x80000FEC: called   4 times   ← route_add (called 4 times in main!)

The function at 0x800002E4, called 111 times, is almost certainly the UART print function – every command handler, every log message, every error path calls it. That matches what we’d expect from looking at the source filenames: a CLI-heavy firmware spends most of its time printing strings.


The CLI Table: Following Pointer Chains

Now for the detective work. We know the firmware has CLI commands because strings found “help”, “version”, “routedump”, and friends. But strings are just bytes – they don’t dispatch themselves. When the user types help, some function has to compare the input against every known command name and call the matching handler.

Here’s where the xref scanner from §4 actually pays off. Pick a command-name string and ask: which LUI/ADDIU pair in the binary loads its address? For "system starting" (0x800020e0) the answer is one xref at 0x800016bc – some function loads it and prints it. But for "help" (0x800018c0)?

$ python3 tools/find_string_xrefs.py firmware.bin 0x800018c0
LUI/ADDIU pairs loading 0x800018c0: 0

Zero. Same for every other command name in both clusters. Yet the strings clearly exist – strings found them. So how is the firmware getting at them?

A second scanner answers the complementary question: scan the binary for any 32-bit word that equals the address we’re looking for, regardless of how it got there.

$ python3 tools/find_word.py firmware.bin 0x800018c0
0x800018c8: 32-bit word == 0x800018c0

The address of "help" shows up as a pointer, sitting in .rodata itself, eight bytes after the string. No code loads it; data points at it. Combine the two scanners and you get a sharp diagnostic for any string in the binary:

code xrefs data hits what the string is
> 0 0 passed to a function (e.g. "system starting")
0 >= 1 referenced through a table (e.g. "help" in a CLI table)
0 0 dead string, never used at runtime

"help" lands in the second row. So do all 16 command names. That’s the signal: when a string has no code xref but its address appears as a 32-bit word inside .rodata, it’s referenced through a table, not printed inline.

The natural shape of such a table is an array of structs, one per command:

struct cli_entry {
    const char *name;       /* "help", "version", etc. */
    void (*handler)(int, const char **);   /* function pointer */
    const char *help;       /* "Show this help message" */
};

That’s 12 bytes per entry (three 32-bit pointers). We can scan .rodata for sequences of 12-byte entries where the first and third words are pointers to printable strings, and the second word is a pointer into the .text section.

# Scan .rodata for CLI table pattern: [string_ptr, code_ptr, string_ptr] × N
for off in range(rodata_start, rodata_end, 12):
    name_ptr  = read_u32(off)
    handler   = read_u32(off + 4)
    help_ptr  = read_u32(off + 8)

    if (is_string_pointer(name_ptr) and
        is_code_pointer(handler) and
        is_string_pointer(help_ptr)):
        # Found a table entry!

Running this on our firmware binary, we find two tables:

Table at 0x800018C8 (10 entries):
  "help"         handler=0x80000354  "Show this help message"
  "version"      handler=0x800003FC  "Show firmware version"
  "uptime"       handler=0x80000838  "Show system uptime"
  "status"       handler=0x80000874  "Show system status"
  "portdump"     handler=0x80000E18  "Dump port status [port_num]"
  "routedump"    handler=0x800008FC  "Dump routing table"
  "reset"        handler=0x8000047C  "Reset the system"
  "loopback"     handler=0x8000135C  "Run loopback test <port>"
  "memtest"      handler=0x800012A4  "Run memory test"
  "log"          handler=0x800004A4  "Show recent log entries"

Table at 0x80001A04 (6 entries):
  "dbglvl"       handler=0x800009FC  "Set debug verbosity level"
  "peek"         handler=0x800005E4  "Read memory address"
  "poke"         handler=0x800004F4  "Write memory address"
  "flashid"      handler=0x800010C4  "Read SPI flash JEDEC ID"
  "crashme"      handler=0x80000B40  "Trigger deliberate crash"
  "regdump"      handler=0x800006CC  "Dump hardware registers"

Two tables – one with 10 entries, one with 6. Now the obvious question: if all 16 commands are in the binary, why does help only show 10 of them?

Run the xref scanner against each table’s base address:

$ python3 tools/find_string_xrefs.py firmware.bin 0x800018c8   # public table
LUI/ADDIU pairs loading 0x800018c8: 3
LUI/ADDIU @ 0x80000388 (inside function 0x80000354)
LUI/ADDIU @ 0x80001468 (inside function 0x80001424)
LUI/ADDIU @ 0x800014f4 (inside function 0x80001424)

$ python3 tools/find_string_xrefs.py firmware.bin 0x80001a04   # hidden table
LUI/ADDIU pairs loading 0x80001a04: 1
LUI/ADDIU @ 0x80001498 (inside function 0x80001424)

Two functions reference the public table; only one references the hidden table – and the public-only function is the asymmetry the help command rests on.

The function at 0x80000354 is the public-only one. From the table itself we know it’s cmd_help, the handler for the help command: a small function that walks one array and prints its entries. The function at 0x80001424 references both tables – that’s the dispatcher: read a line of input, walk the public table looking for a match, then walk the hidden table, and call the handler if either hit. (The dispatcher loads the public table address twice – once when the search loop sets up its iterator, then again after the matched handler returns and the table base needs to be re-fetched to read the function pointer. The compiler didn’t bother keeping the address live across the call. Routine scheduling, not a second logical reference.)

Nothing was “removed.” The 6 commands in the second table aren’t dead code, aren’t disabled, aren’t gated behind a flag. They just don’t get listed by cmd_help, because cmd_help only iterates one of the two arrays the dispatcher accepts. Whoever wrote the firmware wanted peek, poke, crashme, and friends reachable from the prompt without advertising them in the help output – and they implemented that with the simplest possible mechanism: two arrays, one printed, both dispatched.

In the real vendor firmware, the same xref-the-table-base trick discovered 148 CLI commands across multiple dispatch tables – including handlers that could read and write arbitrary hardware registers, dump internal routing tables, and trigger diagnostic modes not documented in any user manual. None of them were hidden. They were just iterated by a different function than the one printing the help text. The strings gave every single one away the moment we asked which functions referenced each table.


Putting It Together: The Subsystem Map

It’s worth naming what we’ve actually been doing for the last few sections. A “symbol,” in the linker’s sense, is a triple: a name, an address, and a kind (function, data, etc.). The flat binary on disk has none of them – they were dropped on the way from ELF to objcopy -O binary. What we’ve been doing is rebuilding that triple ourselves: prologue scanning gave us addresses, JAL-frequency and string xrefs let us guess the kind (UART printer, log function, command handler, leaf), and the strings themselves – "help", "system starting", the source filenames – gave us names. None of them are real symbols. All of them are good enough to navigate by. That’s the move that makes static RE work on stripped firmware: when the symbol table isn’t there, you reconstruct it.

Starting from nothing but a flat binary and the strings command, we’ve built a surprisingly complete picture of this firmware:

Firmware Subsystem Map (built from static analysis only)
========================================================

Identity:   NetGW-MIPS32 v2.4.1-rc3, network gateway device
Source:     6 modules (main.c, cli.c, port.c, route.c, diag.c, flash.c)
Interface:  UART serial console, prompt "gw>"
Functions:  35 prologue-detected (plus an unknown number of leaf functions
            that don't save $ra and slip past the standard prologue scan)
Strings:    111 embedded
Commands:   16 total — all 16 dispatched, only 10 listed by cmd_help

Module Map:
  MOD_CORE  (1) — System init, boot sequence
  MOD_CLI   (2) — Command dispatch, input processing
  MOD_PORT  (3) — 8-port forwarding engine
  MOD_ROUTE (4) — Route table management
  MOD_DMA   (5) — DMA engine (referenced but minimal)
  MOD_DIAG  (6) — Diagnostic tests (loopback, memtest)
  MOD_FLASH (7) — SPI flash operations

Suspicious:
  - "WARNING: no routes programmed for stack" during boot
  - "bridge flag set, skipping" — routes being skipped?
  - "bridge routes deferred to bridge handler" — are they ever handled?

All of this without running the firmware, without a disassembler GUI, and without symbols. Just strings, some Python scripting, and pattern recognition.

But those suspicious messages are nagging. The boot sequence says routes are being skipped because of a “bridge flag,” and then warns that no routes were programmed. That sounds like a bug – routes that should be programmed into the forwarding table are getting silently dropped.

In Part 3, we’ll throw this binary at Ghidra to decompile it into readable C, trace the execution path through the route engine function by function, and find exactly where – and why – those routes are disappearing. Then we’ll fix it. With a hex editor.


References

  • sample_firmware.tar.gzfirmware.bin, build sources (main.c, startup.S, linker.ld, Makefile), and the four Python scripts used in this post under tools/:
    • find_string_xrefs.py – find LUI/ADDIU pairs that load a given address (uses Capstone)
    • find_word.py – find a 32-bit word stored as data anywhere in the binary
    • find_function_prologues.py – find function starts via the addiu sp, sp, -N pattern (uses Capstone)
    • find_cli_tables.py – scan .rodata for {name_ptr, code_ptr, help_ptr} runs
  • Capstone Engine – multi-architecture disassembly framework. apt install python3-capstone (Debian/Ubuntu) or pip install capstone (in a venv elsewhere)
  • MIPS32 Architecture For Programmers Vol. II – Instruction set reference (LUI, ADDIU, JAL encoding)
  • MIPS Calling Convention – Register usage: $a0-$a3, $v0, $ra, $sp
  • strings(1) – GNU binutils string finder; use -t x for hex offsets
  • Part 1: Headers, Symbols, and other things you won’t find – Building the sample firmware