| Status | Meaning |
|---|---|
| Implemented and tested | Code ships, exercised by the test suite, runs on warden demo. |
| Scaffolded | Module and interface exist, the seam is wired end-to-end, but the implementation is a stub or requires optional native tooling to activate. |
Phase 0: Spine
Goal. A queryable model of any Emscripten module: parse the binary, compute stable function identities, seed obvious symbols, and store everything in a versioned knowledge base. Deliverables.warden init, warden ingest, warden funcs, warden show, warden coverage,
warden set-name are all fully operational. The KB can round-trip a module and answer
“what do we know about func[N]?” without any external tooling.
Status: implemented and tested.
| Component | Source | What it does |
|---|---|---|
| WASM section parser | src/warden/ingest/ | Reads type, import, function, export, code, name, element/table sections in pure Python. |
| JS-glue parser | src/warden/ingest/ | Reads Emscripten’s export-index map, dynCall signatures, and PROXY_TO_PTHREAD shape from the .js glue file. |
| Identity fingerprinter | src/warden/identity/fingerprint.py | Computes exact body hash, structural skeleton, opcode-class histogram, call-target set, and type signature per function. The stable_id composite is what annotations are keyed to. |
| Knowledge base | src/warden/kb/ | SQLite schema with module_versions, functions, symbols, and diffs tables; provenance/confidence/locked columns; upsert_symbol economy (human > oracle > agent). |
| CLI | src/warden/cli.py | init, ingest, versions, coverage, funcs, show, set-name, verify, export, and demo. |
warden demo | src/warden/samples.py | Runs the entire pipeline end-to-end on generated sample modules with no network or native toolchain. |
tests/test_ingest.py, tests/test_fingerprint.py, tests/test_kb.py,
tests/test_cli.py, and tests/test_pipeline.py. Fingerprint determinism is tested explicitly:
same bytes in, same stable_id out.
Phase 1: Decompile and export bridge
Goal. Push recovered names from the KB back into existing RE tools so analysts keep their current workflow. Pull decompiler output back in to enrich proposals. Deliverables.warden export --format ghidra emits a runnable Python rename script;
warden export --format headers emits a C header; warden export --format pseudo emits readable
per-function listings; warden export --format kb-text emits a git-diffable plain-text snapshot.
Status: exporters implemented and tested (including a built-in pure-Python decompiler); live Ghidra round-trip scaffolded.
The four export formats (headers, pseudo, kb-text, ghidra) in src/warden/export/text.py
are implemented and covered by tests/test_cli.py. The ghidra format emits a valid Python
snippet that calls getFunctionByWasmIndex (from the
nneonneo/ghidra-wasm-plugin) and
fn.setName(name, SourceType.USER_DEFINED) for every named function in the KB.
Built-in lifter. warden.lift is a pure-Python stack-machine lifter that renders readable
pseudo-C without any native tooling. lift_function(module, func) returns a string;
lift_module(module) lifts every function. warden export --format pseudo now emits real pseudo-C
(not a mnemonic dump), and the dedicated CLI command warden lift <label> [--index N] [--out FILE]
exposes the lifter directly. For example, parse_token lifts to:
What is scaffolded: the script is generated correctly, but the live Ghidra round-trip (launching
Ghidra headlessly, loading the plugin, running the script, and reading decompiled p-code back
through
pyghidra) is not automated. Activating it requires Ghidra and the plugin installed
locally.Phase 2: The Emscripten Oracle
Goal. Auto-identify 40–80% of any Emscripten module as known musl / libc++ / dlmalloc / Emscripten-runtime code, instantly, with real upstream names, so agent and human effort concentrates on the application-specific remainder. Deliverables.warden oracle build and warden oracle identify. A corpus of labeled .wasm
artifacts (emsdk × build-flag matrix) backed by the signature store; version inference from the
distribution of Oracle matches.
Status: Oracle engine and MinHash-LSH index implemented and tested; full emsdk corpus farm scaffolded.
| Component | Source | What it does |
|---|---|---|
| Signature extraction | src/warden/oracle/corpus.py | extract_signatures fingerprints every named defined function in a labeled module and classifies it by library (musl, libc++, dlmalloc, emscripten, wasi-libc, musl-pthread). |
| Signature store | src/warden/oracle/signatures.py | JSON-serialisable store; load / save / extend / libraries(). |
| Identification pass | src/warden/oracle/match.py | Fingerprints every defined function in the target, scores against each corpus signature using similarity(), and writes matches above threshold as oracle-provenance symbols into the KB. |
| MinHash-LSH index | src/warden/oracle/index.py | SignatureIndex.build(store, bands=8) builds a sublinear candidate index; index.candidates(fp) returns approximate neighbors; identify_indexed(kb, version_id, store, threshold=0.82, write=True) is a drop-in replacement for the linear identify() pass. CLI: warden oracle identify <label> --store s --indexed. |
| Version inference | src/warden/oracle/ | infer_version reads the distribution of emscripten_version fields across matches and returns the plurality winner with a calibrated confidence score. |
What is scaffolded:
scripts/corpus/ describes the containerised emsdk matrix build (each tagged
release × -O0…-Oz, -pthread on/off, LTO, exceptions mode). That farm has not been run.
The seed store shipped with the repo (src/warden/oracle/seed_signatures.json) is a small
hand-crafted fixture used by tests. Running the real farm requires emsdk and produces the
multi-thousand-signature corpus that gives Oracle the 40–80% identification rate.Phase 3: Diff and carry-over
Goal. Turn reverse engineering from Sisyphean to incremental. When a new.wasm ships,
classify every function as unchanged / moved / modified / new / deleted, carry all annotations
forward automatically for unchanged and moved, apply a confidence penalty for fuzzy matches, and
emit a semantic changelog that separates genuine application deltas from runtime churn caused by
an Emscripten version bump.
Deliverable. warden diff <from> <to> carries annotations forward and prints a human-readable
changelog; the diff report is stored in the KB for time-travel queries.
Status: fully implemented and tested.
src/warden/diff/engine.py runs a three-pass matching algorithm:
Exact-body hash match
Functions with the same
exact_hash are unchanged if the table index stayed, or moved if
it shifted.Stable-identity match
Functions with the same
stable_id but a different body are treated as unchanged/moved
too, because the stable identity intentionally tolerates relocations the exact hash would miss.oracle / agent / human symbols to the new stable_id with a
0.7 confidence multiplier for fuzzy matches; diff-carry provenance is recorded.
render_changelog separates runtime_churn from app_modified using the _RUNTIME_PREFIXES
table. Covered by tests/test_pipeline.py and tests/test_cli.py.
Phase 4: Agent crew over MCP
Goal. A propose → verify → write-back loop that populates the KB with human effort spent only on what the agents cannot resolve confidently. Deliverables.warden agent <label> running a multi-backend crew; warden mcp serving the KB
as an MCP tool surface so any capable model can drive it from outside.
Status: agent loop, MCP server, and specialized concurrency + struct analyzers implemented and tested; full specialized crew wiring scaffolded.
Implemented:
- Offline heuristic backend (
src/warden/agents/backends.py): deterministic, zero-dependency; uses string xrefs and call-neighborhood context to produce proposals. Works with no API key. - OpenAI backend (
src/warden/agents/backends.py): structured JSON output via the OpenAI Responses API, modelgpt-5.3-codexby default. Auto-selected whenOPENAI_API_KEYis set andopenaiis installed (pip install warden-re[agents]).codexandoaiare provider aliases. - Anthropic backend (
src/warden/agents/backends.py): structured JSON output via the Anthropic Messages API, modelclaude-opus-4-8. Auto-selected whenANTHROPIC_API_KEYis set andanthropicis installed, if OpenAI is not available.make_backendauto-detects. - Crew loop (
src/warden/agents/crew.py):gather_factsseeds each call with hard evidence (type signature, call targets, string xrefs, opcodes) to constrain hallucination;verify_proposalis a cheap syntactic gate;run_agent_passiterates bottom-up (fewest call targets first), skips already-confident symbols, gates through the verifier and KB economy. - MCP server (
src/warden/mcp/server.py): FastMCP server exposing project reads, function facts, agent backend discovery, server-side agent runs, and economy-gated symbol proposals. Agent writes are economy-gated at the KB layer, so they cannot overwrite human or higher-confidence Oracle annotations. Activate withpip install warden-re[mcp]thenwarden mcp. - Concurrency analyzer (
src/warden/analysis/concurrency.py):analyze_concurrency(module, kb, version_id)returns aConcurrencyReportwith.shared_memory,.atomic_sites,.pthread_markers, and.facts. Populates the previously-emptythread_modelKB table viakb.add_thread_fact. Deterministic; zero external dependencies. - Struct analyzer (
src/warden/analysis/structs.py):analyze_structs(...)returns a list ofStructLayoutvalues (each carrying.name,.fieldsasStructField(offset, size, type, name), and.source_function). Populates thestructsKB table viakb.upsert_struct. CLI:warden analyze <label>runs both analyzers and persists all facts.
What is scaffolded: the VISION describes six specialized agents (Oracle, Concurrency,
Type/Struct, Naming/Summarization, Diff, and Verifier). The current crew is a single general loop;
the concurrency and struct analyzers produce facts but are not yet wired as autonomous agents that
re-drive the naming loop. A true bottom-up call-graph ordering pass that re-decompiles each
function after its callees’ names are resolved is not yet implemented.
Phase 5: Verifier
Goal. Make “understood” provable. Lift target functions via wasm2c/w2c2 to C, recompile the agent reconstruction the same way, differentially execute both over a fuzzer-generated corpus, and require I/O and memory match. Deliverable.warden verify <wasm> reports determinism and differential-readiness. The
verifier gate in the agent loop activates the behavioral check when the required tooling is
present.
Status: determinism verification and mini-interpreter with differential execution implemented and tested; wasm2c differential harness scaffolded.
Implemented in src/warden/verify/harness.py:
verify_determinismre-ingests the same bytes twice and confirms every function’sstable_idis bit-identical across runs. This is the foundational guarantee that the entire carry-over mechanism depends on.tooling_statusprobesPATHforwasm2c,w2c2, a C compiler, andwasm-validate; reportscan_differentialtruthfully.differential_planreturns the concrete shell steps for the wasm2c lift → recompile → differential execution pipeline, and whether the environment can run them.warden verify <wasm>surfaces this output.
warden.interp is a zero-dependency interpreter for the integer subset of
WebAssembly that makes behavioral equivalence runnable without any native toolchain.
execute_function(module, func, args, *, host=None, memory=None, fuel=100000)executes a single function and returns a list of integer results. RaisesUnsupportedExecutionfor instructions outside the integer subset.differential_execute(mod_a, fn_a, mod_b, fn_b, inputs)runs both functions over a list of argument tuples and returns a per-input list of dicts reporting whether the outputs matched. For example, it provesparse_tokenv1 and v2 are behaviorally equivalent (v2’s bounds-check result is dropped from the return), while flagging thatinternal_crcdiffers.- CLI:
warden exec <label> <index> [args...]prints the result of executing a function by index directly from the KB.
What is scaffolded: the actual wasm2c differential harness (launching wasm2c, compiling the
lifted C, running the fuzzer corpus, comparing outputs) is not automated. The verifier gate in
crew.py:verify_proposal is currently the cheap syntactic check; the behavioral hook is the
documented plug-in point for when a C toolchain is detected. SeeWasm symbolic checks and
Wasabi/Frida dynamic tracing are not yet wired.Phase 6: UX
Goal. A “RE-as-version-control” interface: diff view, confidence heatmap, time-travel query (“when did this function first appear?”), thread/memory map, one-click export to pseudocode or headers. Deliverable. A usable interface for analysts who want WARDEN’s power without running CLI commands manually. Status: static HTML report generator implemented and tested; full interactive UX scaffolded. All the data a UX would consume is present in the KB today: versioned functions, per-function confidence and provenance, diff reports stored withkb.store_diff, and the kb-text export
format designed to diff cleanly in git. The warden demo output already produces a human-readable
coverage progression and changelog in the terminal.
Static HTML report. warden.report generates a self-contained HTML file (inline CSS, no
server required) that captures an analysis session as a shareable artifact.
render_report(kb, version_id, module=None)returns the HTML as a string;write_report(kb, version_id, path, module=None)writes it to disk.- The report includes a coverage summary, a confidence heatmap of functions colored by provenance and confidence score, a thread/memory model section drawn from the concurrency analyzer’s facts, and the diff changelog.
- CLI:
warden report <label> [--out FILE].
Status at a glance
| Phase | Name | Status |
|---|---|---|
| 0 | Spine | Implemented and tested |
| 1 | Decompile and export bridge | Exporters + built-in pure-Python lifter implemented; live Ghidra round-trip scaffolded |
| 2 | The Emscripten Oracle | Engine + MinHash-LSH index implemented; emsdk corpus farm scaffolded |
| 3 | Diff and carry-over | Implemented and tested |
| 4 | Agent crew over MCP | Loop, MCP server, concurrency + struct analyzers implemented; specialized agent wiring scaffolded |
| 5 | Verifier | Determinism + mini-interpreter + differential execution implemented; wasm2c harness scaffolded |
| 6 | UX | Static HTML report implemented; full interactive UX scaffolded |
How to contribute
The project is alpha. Every phase has a concrete gap where a focused contribution lands quickly.Phase 0: Spine (best entry point)
Phase 0: Spine (best entry point)
Fix edge cases in the WASM section parser (
src/warden/ingest/), add fingerprint properties
to src/warden/identity/fingerprint.py, or improve the JS-glue parser to handle additional
Emscripten output shapes. Every change can be validated against the existing test suite with
pytest.Phase 1: Export bridge
Phase 1: Export bridge
Add a Ghidra headless launch wrapper that runs the generated rename script via
analyzeHeadless, or wire pyghidra to pull p-code back into the lifter output. The built-in
lifter in src/warden/lift/ is a good starting point for extending coverage to floating-point
and SIMD opcodes. Requires Ghidra and the nneonneo/ghidra-wasm-plugin installed locally.Phase 2: Oracle corpus
Phase 2: Oracle corpus
Run the emsdk matrix build (
scripts/corpus/) and contribute the resulting oracle.json as a
versioned artifact. Improve classify_library to cover additional Emscripten runtime prefixes.
Tune the MinHash-LSH band count in SignatureIndex.build for better precision/recall on
cross-opt-level matching.Phase 3: Diff
Phase 3: Diff
Improve the fuzzy similarity score in
src/warden/identity/fingerprint.py (call-graph
anchoring, dominator-tree comparison) to reduce false modified/new classifications on large
modules. Add time-travel query helpers to KnowledgeBase (when_first_seen, evolution_of).Phase 4: Agent crew
Phase 4: Agent crew
Wire the concurrency and struct analyzers (
src/warden/analysis/) as autonomous crew agents
that feed facts back into the naming loop. Add a true call-graph ordering pass that
re-decompiles each function after its callees are resolved. Add more MCP tools (get_diff,
search_symbols, export_kb_text).Phase 5: Verifier
Phase 5: Verifier
Automate the wasm2c → compile → fuzz → compare pipeline in
harness.py. Wire
tooling_status().can_differential to actually gate the crew loop and upgrade
verify_proposal to invoke the behavioral check for proposals above a confidence threshold.
Extend the mini-interpreter in src/warden/interp/ to cover floating-point and memory opcodes.Phase 6: UX
Phase 6: UX
Build a terminal diff view (rich
Table comparing two versions) or an HTTP API over the KB so
a browser frontend can consume it. The static HTML report (src/warden/report/) and the MCP
server (warden mcp) are already the right programmatic surfaces to build on.