.wasm. You re-do the work. That’s the real
pain. Not the first decompile, but the second through hundredth.
WARDEN treats RE as a living, versioned knowledge base keyed to stable function
identities rather than file offsets, so your names, types, and notes carry across binary
updates automatically.
The Emscripten Oracle
Emscripten, musl, dlmalloc, and libc++ are open source, so WARDEN compiles its own
ground truth and auto-identifies runtime functions in a target, attaching the real upstream
name. You stop reversing code that already has public source.
A persistent symbol KB
Every name, type, struct, and comment lives in a database keyed to a content identity, with
provenance and a confidence score. It survives rebuilds instead of dying with the file.
Cross-version carry-over
When a new
.wasm drops, WARDEN diffs it, ports annotations to unchanged and moved
functions, and surfaces only the genuine deltas. Reversing becomes incremental.See the whole thing in one command
The core path is pure Python + standard library. No Ghidra, no Emscripten, no native toolchain required to fork and run.warden demo generates sample modules and walks the system end-to-end: ingest, Oracle
identification, agent crew, ship a new version, diff and carry-over. You watch coverage climb
to 100% and a v1 → v2 semantic changelog get produced with zero manual work.
Start here: the 60-second quickstart
Install WARDEN and run the full pipeline on your own module.
How it works
Stable identity
Each function gets a content identity (structural skeleton + call-neighborhood + type
signature) that stays constant across rebuilds even when its table index shifts. Annotations
attach to that, not to an offset. Read the concepts →
One engine, two jobs
The same fingerprint/similarity engine powers both the Oracle (match against compiled ground
truth) and the diff carry-over (match against the previous version).
A provenance economy
Every write records who made it:
human, oracle, export, agent, or diff-carry,
along with a confidence score. Human edits are sovereign; agents only overwrite lower-confidence agent
output. That’s what makes it safe to re-run the whole crew on every update.Agents do the labor
A propose, verify, write-back crew fills the KB without proportional human time. Runs with
zero dependencies via an offline heuristic, or upgrades to a real LLM crew.
Read about agents →
What “100% reverse engineered” means here
Perfect source recovery is impossible. The compiler destroyed that information. WARDEN targets a rigorous, achievable 100% along three axes.100% symbol coverage
100% symbol coverage
Every function has some binding: an Oracle-confirmed real name, a recovered name, or an
agent proposal with a confidence score. No anonymous
func_412 ever remains.100% behavioral equivalence (verifiable)
100% behavioral equivalence (verifiable)
Reconstructions are differentially executed against the original until outputs match.
Determinism verification runs today; the wasm2c differential harness activates when a C
toolchain is present. Read about verification →
100% change accountability
100% change accountability
Across versions, every byte-level delta is mapped to a function and a semantic explanation.
Nothing changes silently. Read about diffing →
WARDEN is alpha. The spine (ingest, KB, identity, diff, Oracle matching, exporters, the
agent loop, and determinism verification) runs today. The deeper integrations (Ghidra round-trip,
the full emsdk corpus farm, the wasm2c verifier, and a UX) are scaffolded with clear interfaces.
See the roadmap and honest limits.