Skip to main content
Readable but wrong is the cardinal sin of reverse engineering. WARDEN treats verification not as a post-hoc sanity check but as a first-class pipeline phase. This page explains what “verified” means along three distinct axes, what runs unconditionally today, what activates when native tooling is present, and what is scaffolded for the future.

Three honest axes of “100%”

“100% reverse engineered” is a precise, three-part claim. Each axis addresses a different failure mode.
AxisClaimFailure mode it guards
Symbol coverageEvery defined function has a binding: a name, a provenance, and a confidence score. No func_412 stays anonymous.Dark spots that conceal unknown behavior.
Behavioral equivalenceReconstructions are differentially executed against the original until their outputs and side effects match on a corpus of inputs.Plausible-sounding names that describe the wrong behavior.
Change accountabilityAcross any two versions, every delta is mapped to a function and a semantic explanation. Nothing changes silently.Missed security-relevant changes in an update.
Coverage and accountability run fully today. Behavioral equivalence is corpus-bounded evidence, not a formal proof, and the automated harness activates only when the right native toolchain is present.

Running the check

warden verify app_v1.wasm
The verify command reads the file, runs the determinism check, and prints a colored summary. It then calls differential_plan and reports whether the optional behavioral harness is ready:
42/42 functions fingerprint deterministically
differential equivalence ready: False  tooling={'wasm2c': False, 'w2c2': False, 'cc': True, 'wabt_validate': False, 'can_differential': False}
If any function’s stable_id is unstable, the exit line prints in red and the offending function indices are listed. verify takes no options beyond the .wasm path. The determinism check always runs; differential readiness is always reported.

What runs today: verify_determinism

verify_determinism(wasm_bytes: bytes) -> DeterminismReport is the zero-dependency check that runs on every call to warden verify. It requires only the Python standard library.

How it works

The function parses the same raw bytes twice via parse_module, calls fingerprint_function on each pair of defined functions, and compares the resulting stable_id strings byte-exactly. The returned DeterminismReport carries:
  • total: number of defined (non-import) functions examined.
  • stable: count whose stable_id matched across both parse runs.
  • unstable: list of function indices where the id differed.
  • ok: True when unstable is empty.
  • summary: a one-liner such as "42/42 functions fingerprint deterministically".

Why it matters for carry-over

The stable_id is WARDEN’s stable function identity: a composite of the structural fingerprint, call-neighborhood, and type signature that stays constant across rebuilds even when table indices shift. The entire annotation carry-over mechanism (the feature that makes RE incremental rather than Sisyphean) depends on every stable_id being reproducible from the binary bytes alone.
If a stable_id were non-deterministic, annotations written against one parse would silently fail to port to the next version. An unstable result from verify_determinism indicates a bug in the fingerprinting engine that must be fixed before any carry-over can be trusted.
Because the check operates on raw bytes with no external tooling, it runs identically in CI, in a sandboxed environment, and on a laptop with nothing native installed.

Differential execution (runs today)

The warden.interp mini-interpreter is the middle tier: it makes behavioral equivalence runnable right now, with zero native toolchain required, for the integer subset that covers the vast majority of Emscripten arithmetic and glue code.

What the interpreter covers

execute_function is a pure-Python stack-machine evaluator for the i32 integer subset. It models:
  • All i32 arithmetic and bitwise opcodes (add, sub, mul, and, or, xor, shl).
  • All i32 comparisons, signed and unsigned (eq, ne, lt_s, gt_s, le_s, ge_s, eqz).
  • Structured control: block, loop, if/else/end, br, br_if, return.
  • Local variable operations: local.get, local.set, local.tee.
  • Direct function calls (with recursive fuel tracking).
  • Linear memory loads and stores (i32.load, i32.store).
Anything outside that scope raises UnsupportedExecution. The harness catches this and records the pair as “undecided” rather than crashing, so a partially-modeled module still yields useful results for the functions that are covered. Execution is deterministic by construction: the result depends only on the module bytes, the function index, and the argument vector. No wall-clock, no RNG, no external state.

Running one function

warden exec <label> <index> [args...]
The exec command looks up the named project, parses its .wasm, locates function <index>, and executes it on the provided arguments.
warden exec app_v1 42 10 3
# → [91]
Programmatically:
from warden.ingest import parse_module
from warden.interp import execute_function

module = parse_module(wasm_bytes)
func = module.functions[42]
results = execute_function(module, func, [10, 3])
# → [91]
execute_function accepts optional keyword arguments:
ArgumentDefaultPurpose
hostNoneCallback for imported function calls; absent → imports are no-ops
memoryZeroed 64 KiBPre-seeded bytearray; observe stores after the call
fuel100000Instruction budget; exceeded → UnsupportedExecution

Differential execution

differential_execute runs two functions from two modules over the same input corpus and reports per-input agreement:
from warden.interp import differential_execute

rows = differential_execute(mod_v1, fn_v1, mod_v2, fn_v2, inputs=[
    [0, 0], [1, 2], [255, 1], [0x7FFFFFFF, 1],
])
# Each row: {"args": [...], "a": [...], "b": [...], "match": True|False}
The match field is True when both sides return identical result stacks. If either side raises UnsupportedExecution, that side records None and match is False (undecided, not wrong). Concrete example. parse_token v1 and v2 differ structurally (v2 adds a bounds check), but the bounds-check result is dropped before the return. differential_execute proves they are behaviorally equivalent across the full input corpus: every row shows match: True. internal_crc, by contrast, shows match: False on most inputs, which is a genuine behavioral change.
# parse_token v1 == v2: bounds check is dropped, behavior unchanged
{"args": [10, 3],          "a": [91],  "b": [91],  "match": True}
{"args": [0x7FFFFFFF, 1],  "a": [...], "b": [...],  "match": True}

# internal_crc v1 != v2: genuine behavioral divergence
{"args": [10, 3],  "a": [0x1234], "b": [0xABCD], "match": False}
differential_execute never mutates the modules or functions it receives, and each input gets a fresh zeroed memory. You can run it in a loop across a large corpus without side effects.

Three-tier verification model

With the interpreter in place, WARDEN now has three stacked tiers for behavioral equivalence:
TierDependencyCoverage
warden.interp (mini-interpreter)Pure Python, stdlib onlyi32 integer subset; runs today on any machine
wasm2c / w2c2 + cc (native harness)WABT + C compilerFull numeric and memory behavior; corpus-bounded
SeeWasm (symbolic cross-check)SeeWasm (future)Path-condition soundness for security-critical branches
The interpreter fills the gap between the zero-cost determinism check above and the native harness below. For the integer-heavy glue code Emscripten emits, it is often sufficient on its own.

The optional differential harness

The behavioral harness activates only when the right native toolchain is present. Nothing silently pretends a check ran. WARDEN reports readiness honestly.

Tooling detection: tooling_status()

tooling_status() uses shutil.which to probe PATH for four tools:
FieldBinary probedRole
wasm2cwasm2cWABT’s WebAssembly-to-C transpiler
w2c2w2c2Alternative wasm-to-C transpiler (faster compile)
cccc, clang, or gcc (first found)C compiler to build the lifted output
wabt_validatewasm-validateStructural validation of the .wasm before lifting
The can_differential property is True when (wasm2c or w2c2) and cc. That is the minimum required to execute the harness. wabt_validate is independent, useful for confirming the input is well-formed before spending time lifting it.

The plan: differential_plan(wasm_path)

differential_plan(wasm_path: str) -> dict calls tooling_status() and returns a dict describing what would run and whether it can:
{
    "tooling": { ... },        # ToolingStatus.as_dict()
    "ready": True | False,     # can_differential
    "steps": [ ... ],          # concrete shell commands
    "note": "...",             # human-readable install hint when not ready
}
When ready is False, the note field tells you exactly what to install. The concrete steps it describes, in order:
1

Transpile the original to C

wasm2c app_v1.wasm -o target.c
wasm2c (from WABT) produces a self-contained C file that is functionally equivalent to the original by construction.
2

Compile the lifted C to a native executable

cc -O2 target.c wasm-rt-impl.c -o target_ref
The WABT runtime shim wasm-rt-impl.c is included with WABT.
3

Compile the agent/LLM reconstruction the same way

The reconstruction is compiled identically to produce a second native executable.
4

Differential execution over a fuzzer corpus

Both executables are run over a fuzzer-generated input corpus. Outputs and observable memory side effects must agree across every input.

Activating the harness

brew install wabt
After installation, warden verify app_v1.wasm will report differential equivalence ready: True.

What behavioral equivalence actually claims

When the differential harness reports a function as verified, the precise claim is: corpus-bounded behavioral equivalence: the reconstruction agrees with the original on every input tried, to the depth the fuzzer reached. This is strong, automatable evidence. It is not a formal proof. The corpus is finite, and an adversarial input outside it could in principle expose a divergence. Formal equivalence checking via SMT-based symbolic execution over all inputs is a future direction for cryptographically critical functions, but it is not what the current harness provides.
For most reverse-engineering purposes (understanding behavior, tracking changes, writing accurate documentation), corpus-bounded evidence is sufficient and is considerably stronger than “the name sounds right.”

Future dynamic ground-truth hooks

Three additional sources of behavioral evidence are identified in the architecture vision and scaffolded as future work.
For specific claimed path conditions (“this branch fires when magic == 0xCAFE”), SeeWasm can confirm the condition symbolically rather than relying on the model’s assertion. This would plug in as an additional gate in the verifier after the differential harness, providing path-condition soundness for security-critical branches.
Wasabi is a WebAssembly instrumentation framework; Frida and Chrome DevTools expose runtime traces. Live execution traces confirm which element-table targets actually fire, which atomic operations guard which memory regions, and what values flow through indirect calls. This is precisely the information hardest to recover statically. These hooks would validate the thread model and dynCall attribution produced by the concurrency agent.
Any function matched by the Emscripten Oracle to a known upstream source (musl, dlmalloc, the Emscripten runtime) is verified by definition: its behavior is already documented in the public source. The reconstruction can be checked directly against that real source code. This is available today for Oracle-matched functions, at no additional toolchain cost.

Verification in the confidence economy

Every symbol written to the KB carries a provenance and a confidence score. The verifier controls whether an agent proposal is promoted:
  • Oracle matches (provenance="oracle") are inherently verified against compiled ground truth. They receive the highest confidence.
  • Agent proposals (provenance="agent") are gated by the verifier before write-back. Until behavioral equivalence is confirmed, they carry a confidence below 1.0 and are marked unverified.
  • Human names (provenance="human", locked=True) are sovereign. The verifier does not touch them and they cannot be overwritten by agent passes.
No unverified annotation is ever silently promoted to ground truth from which subsequent carry-overs or agent passes propagate. Verification failure is always visible in the KB.

Summary of checks

CheckDependencyRuns todayGuarantee
verify_determinismPure Python, stdlib onlyYesstable_id byte-stability, which guards all carry-over
warden.interp differential executionPure Python, stdlib onlyYesCorpus-bounded behavioral equivalence for the i32 subset
Structural validationwasm-validate (WABT)When installedInput is a well-formed WebAssembly module
Differential harnesswasm2c or w2c2 + ccWhen installedCorpus-bounded behavioral equivalence (full instruction set)
Oracle source verificationEmscripten Oracle corpusToday, Oracle-matched functions onlyExact behavioral match to upstream source
Symbolic cross-checkSeeWasmFuturePath-condition soundness for critical branches
Dynamic trace validationWasabi / Frida / DevToolsFutureRuntime dynCall and atomic ground truth

Phase 4: agents

Where the proposals the verifier gates actually come from.

Phase 3: diff

How verified annotations carry forward when a new version ships.
Last modified on June 7, 2026