← Writing
Research

Firmware exploration: LLM as your annotator

Nov 30, 2025 · 5 min read

If you’ve ever opened a firmware image and stared at a hex dump thinking “nope,” you’re not alone.

Modern IoT and embedded devices ship with complex firmware: full Linux distributions, RTOS kernels, proprietary bootloaders, custom update mechanisms, and often… questionable security decisions. Large-scale studies have shown that firmware is a gold mine for vulnerabilities, from hard-coded credentials to unsafe update logic and exposed debug interfaces.

At the same time, governments and standards bodies are pushing manufacturers to treat firmware security seriously. NIST’s IoT Cybersecurity guidance and documents like SP 800-213/213A explicitly call out firmware, update mechanisms, and device integrity as critical capabilities for secure IoT products.

In this post, I’m not going to pretend that a Large Language Model (LLM) will magically reverse engineer your firmware for you. Instead, I’ll show how you can use an LLM as an annotator and sidekick while you do the real work:

All in a way that keeps you in control, and the AI in a supporting role.

⚠️ Everything here is about defensive security / research on devices you own or are authorized to test. Don’t use these techniques on systems you’re not allowed to touch.

1. The Traditional Firmware Exploration Workflow (Very Short Version)

A very typical firmware exploration flow looks like this:

  1. Obtain firmware
    From a vendor update file, web UI, or by dumping flash via JTAG / SPI.
  2. Triage and unpack
    Use tools like binwalk, dd, unsquashfs, or firmware-mod-kit to unpack file systems and images.
  3. Scan for strings and patterns
    Use strings, grep, or custom scripts to find credentials, URLs, debug commands, etc.
  4. Reverse engineer binaries
    Use tools like Ghidra, IDA, Radare2, or Binary Ninja to analyze executables and libraries.
  5. Dynamic analysis / rehosting
    Use QEMU or specialized frameworks for firmware rehosting to actually run firmware in a controlled environment and interact with it.

Each step is noisy and produces walls of text and code. That’s where an LLM can make things less painful.


2. Where LLMs Actually Help: “Annotator, Not Autopilot”

Recent work has looked at using LLMs for binary code understanding, such as:

Industry research has also explored LLMs as reverse engineering sidekicks that help malware analysts explain decompiled functions, outline control flows, or draft detection logic—without replacing analysts.

You do the reversing.
The LLM helps label, summarize, cluster, and explain.

Think of it as a hyperactive junior sitting next to you, happy to generate function names, markdown notes, and hypotheses while you decide what’s real and what’s hallucination.

Let’s walk through some concrete examples.


3. Strings + LLM: Turning Noise Into Hints

A classic first step on a firmware image is just:

strings firmware.bin | less

But this dumps everything: menu texts, error messages, random config keys, leftover debug prints, etc. You can make this a lot more effective with a little Python and an LLM.

3.1 Step 1 – Extract and filter strings

Here’s a small Python script to extract printable strings from a firmware blob:

import re
from pathlib import Path

def extract_strings(path, min_len=4):
    data = Path(path).read_bytes()
    pattern = rb"[ -~]{%d,}" % min_len  # printable ASCII
    return [s.decode(errors="ignore") for s in re.findall(pattern, data)]

if __name__ == "__main__":
    strings = extract_strings("firmware.bin")

    # Naive filters: paths, URLs, shell-like commands
    interesting = [
        s for s in strings
        if ("/" in s or "http" in s or "ssh" in s or "admin" in s.lower())
    ]

    for s in interesting[:50]:
        print(s)

You now have a list of “interesting” candidate strings: endpoints, file paths, error messages, maybe even hidden menu options.

3.2 Step 2 – Ask the LLM to annotate

Take a subset of these strings (don’t paste your entire firmware dump into a cloud LLM—be mindful of confidentiality) and send them to an LLM with a prompt like:

You are an embedded security analyst.
The following strings were extracted from a router firmware image.

1. /bin/diagnostic_cli
2. /usr/sbin/backup_cfg
3. POST /apply.cgi
4. admin:admin
5. Enable remote management

For each string:
- Guess what subsystem it might belong to (web UI, update system, debug, etc.).
- Mark whether it’s interesting for security review and why (1–2 sentences).

In code (pseudo-style, using a generic llm_chat() helper so you can plug in OpenAI / local LLM / etc.):

def annotate_strings_with_llm(strings_chunk):
    prompt = (
        "You are an embedded firmware security analyst.\n\n"
        "You are given a list of strings extracted from firmware.\n"
        "For each string, produce:\n"
        "- category: (web_ui | auth | config | debug | logging | update | other)\n"
        "- interesting: (yes/no)\n"
        "- reason: one short sentence.\n\n"
        "Strings:\n"
        + "\n".join(f"- {s}" for s in strings_chunk)
    )
    # Replace this with your LLM client call
    response = llm_chat(prompt)
    return response

if __name__ == "__main__":
    strings = extract_strings("firmware.bin")
    interesting = [...]  # apply your filters
    chunk = interesting[:40]
    print(annotate_strings_with_llm(chunk))

Suddenly, instead of reading 300 anonymous strings, you get a structured, human-readable checklist:

You still have to verify everything—but now you have a prioritized map.


4. Ghidra + LLM: Explaining Weird Functions

Once you move into static analysis, tools like Ghidra are the workhorse for exploring firmware binaries. You load ELF/ARM/MIPS binaries, let Ghidra analyze them, and then decompile functions into a pseudo-C view.

The hard part is not “disassembling”—it’s understanding what a function actually does.

Research and experiments with LLMs show they can help with tasks like:

That’s perfect for our “annotator” idea.

4.1 Workflow

  1. In Ghidra, decompile a function you suspect is security-relevant:
    • Maybe it’s referenced from the /login CGI handler
    • Or from a firmware update routine
  2. Copy a sanitized snippet of the decompiled C-like code (omit hard-coded secrets / proprietary stuff if using a cloud LLM).
  3. Ask the LLM something like:
You are assisting with firmware reverse engineering.
Here is a decompiled function from an embedded Linux binary (MIPS):

int sub_40123C(char *user, char *pw) {
    FILE *f = fopen("/etc/passwd", "r");
    if (!f) return -1;
    ...
}
Tasks:
1. Give this function a descriptive name (C-style).
2. Summarize what it does in bullets.
3. Call out any security-relevant behavior (auth checks, file access, cryptography, etc.).

You’ll often get surprisingly good results:

You can even script this: export decompiled functions or their summaries and feed them to an LLM in batches, building a “map” of the binary where each function has:

This mirrors workflows used in both academic research and experimental tools that use LLMs as reverse engineering assistants for malware and binaries.


5. Configs, Scripts, and “Weird Blobs”: Semantic Tagging

Firmware images are full of non-binary artifacts:

Instead of manually reading each file, you can:

  1. Programmatically find candidates – files under /etc/, /usr/script/, /www/, etc.
  2. Summarize them with an LLM to label purpose and risk.

5.1 Example: scanning shell scripts

from pathlib import Path

def list_shell_scripts(root):
    root = Path(root)
    return list(root.rglob("*.sh"))

def summarize_script(path):
    content = Path(path).read_text(errors="ignore")[:4000]  # truncate just in case
    prompt = (
        "You are reviewing firmware init scripts.\n\n"
        f"File path: {path}\n"
        "Script:\n\n"
        "```sh\n"
        f"{content}\n"
        "```\n\n"
        "Tasks:\n"
        "1. Briefly summarize what this script does.\n"
        "2. Call out any security-relevant actions "
        "(starting services, changing permissions, touching auth/crypto, enabling remote access).\n"
        "3. Rate its review priority: high / medium / low.\n"
    )
    return llm_chat(prompt)

if __name__ == "__main__":
    for p in list_shell_scripts("squashfs-root"):
        print("=== ", p, " ===")
        print(summarize_script(p))
        print()

This gives you:

You can do the same for:


6. LLMs + Rehosting: Augmenting Dynamic Analysis

The more hardcore end of firmware analysis involves rehosting: running firmware in an emulated environment to see how it behaves at runtime. Researchers and practitioners use various frameworks to emulate peripheral devices and remove hardware dependencies.

LLMs can help here too—but again, as annotators:

6.1 Example prompt for log triage

You are a firmware analyst. The following log lines come from an emulated router firmware:

[HTTPD] POST /apply.cgi action=wan_settings
[HTTPD] user=admin from 192.168.0.10
[KERNEL] device eth0 entered promiscuous mode
[APP] enabling remote_management on port 8080
...
1. Summarize the key events.
2. Identify any security-relevant changes.
3. Suggest 2–3 follow-up checks I should perform against this firmware.

You get a quick human-level summary instead of reading 500 lines manually.


7. Limitations and Risks: This Is Not Magic (and That’s Good)

Before we get too excited, reality check:

  1. Hallucinations are real
    LLMs can improve naming and summarization, but they still get things wrong and may invent functionality that isn’t present.
    • Never treat LLM output as ground truth.
    • Use it as a hint, then confirm by reading the actual code / disassembly.
  2. Confidentiality and IP
    If you’re analyzing proprietary firmware, uploading large chunks to a cloud LLM may be unacceptable (legally, ethically, or by contract).
    • Consider local open-source models for sensitive work.
    • Use strict data minimization: send only what you must (e.g., one function, one script).
  3. Ethics and legality
    Secure development and IoT guidance emphasize secure design, responsible vulnerability management, and lifecycle support.
    • Use these techniques to improve security, not undermine it.
    • Follow responsible disclosure practices if you discover real issues.
  4. Skill still required
    Experiences from LLM-powered reverse engineering consistently conclude that LLMs augment experts; they don’t turn beginners into instant firmware ninjas.
    • You still need to know how toolchains, OSes, networking, and cryptography work.
    • LLMs amplify good analysts; they don’t replace them.

8. Practical Tips: Making LLMs a Useful Firmware Sidekick

If you want to actually integrate LLMs into your firmware workflow, here are some practical patterns:

  1. Use strong roles in prompts
    • “You are an embedded firmware security analyst.”
    • “You are assisting reverse engineering of a MIPS-based router binary.”
  2. Give context concisely
    • Mention architecture (ARM/MIPS/x86), OS (Linux/RTOS), and approximate purpose (router, camera, PLC).
    • This helps the model make better guesses about functions and config files.
  3. Chunk smartly
    • Don’t send entire file systems.
    • Work per-function, per-script, or per-log-chunk.
  4. Always ask for structure
    Ask for JSON-like or bullet-pointed output, for example:
    {
      "function_name": "...",
      "high_level_summary": "...",
      "security_relevance": "..."
    }

    This makes it easier to feed back into your own tooling.

  5. Build your own “knowledge notebook”
    • Store LLM explanations in markdown or a small database.
    • Link them back to offsets / function addresses / file paths so you can quickly revisit them later.
  6. Compare models
    • Some models are better at code; others at natural language.
    • Try both cloud and local options, especially if you need privacy.

← All writing