Cross-version diff & carry-over

When the vendor ships a new .wasm, most of your reverse-engineering work is still valid. The same functions exist, doing the same things, at shifted table indices. The diff engine recovers that work automatically, classifies every function in the new binary, ports annotations forward, and hands you a focused changelog of what actually changed in the application. This is Phase 3 of the WARDEN pipeline: diff/engine.py, driven by warden diff.

The diff engine reuses the identical fingerprint and similarity engine as the Oracle: same hash compositions, same similarity() function, different corpus. Any improvement to the fingerprinting algorithm benefits both. See core concepts for the full fingerprint breakdown.

How it works

diff_versions() loads all defined functions for both versions from the KB, reconstructs their fingerprints from the stored rows, and runs three passes in sequence. Each pass consumes functions from a shared “unmatched” set so a function is never counted twice.

Pass 1: exact-body match

Functions with an identical exact_hash (SHA-256 of the raw body bytes) are matched first. This is O(n) via a dictionary lookup. No similarity math is needed.

Same index in both versions → classified unchanged.
Different index → classified moved.

Score is 1.0 for both. These functions already share a stable_id row in the symbols table, so their annotations are already there. There is nothing to port.

Pass 2: stable identity match

Functions that share a stable_id but whose raw body differs slightly (for example, a literal constant was patched in a way that structural_hash absorbs) are matched by identity key lookup. Because the KB’s symbols table is keyed on stable_id (not on a version or function index), these functions also already share an annotation row.

Classified unchanged or moved by the same index-comparison rule.
Score is 0.99 to distinguish from a literal exact-body match.

Both Pass 1 and Pass 2 carry annotations verbatim and for free: the functions literally point at the same symbol row.

Pass 3: greedy fuzzy match

Remaining functions (those that changed meaningfully enough to get a new stable_id) are paired by the highest similarity().overall score among all remaining-from candidates.

overall = 0.45 × fuzzy_jaccard          # MinHash over 4-gram opcode tokens
        + 0.25 × histogram_cosine        # opcode-class distribution
        + 0.20 × call_neighborhood_jaccard  # shared import call targets
        + 0.10 × (1 if structural_hash matches else 0)

A pair is accepted if the best score is at or above MODIFIED_THRESHOLD = 0.6. Accepted pairs are classified modified. Functions left unmatched after all three passes become new (in the newer version only) or deleted (in the older version only).

The 0.6 threshold is deliberately lenient. A modified function that retains its general call pattern and opcode character but gained a bounds check or an extra branch will typically score 0.65–0.85. Setting the threshold higher risks losing carry-over for legitimately modified functions; setting it lower creates false pairings. The value is defined as MODIFIED_THRESHOLD in diff/engine.py and can be overridden if you are working with a heavily optimized corpus.

Classification summary

Class	Meaning	Index change	Annotation ported?
`unchanged`	Identical body (`exact_hash` or `stable_id` match), same index	No	Already shared
`moved`	Identical body or identity, different index	Yes	Already shared
`modified`	Fuzzy match above threshold; body changed meaningfully	Maybe	Copied with penalty
`new`	No match found in the older version	N/A	None; queued for analysis
`deleted`	No match found in the newer version	N/A	Archived

Annotation carry-over: identical vs. fuzzy

Unchanged and moved: zero work

Functions classified unchanged or moved share the same stable_id between both versions. Because the symbols table is keyed to stable_id, not to a version row or function index, these functions already point at the same symbol. There is nothing to copy. The name, type signature, summary, provenance, and confidence from your v1 work are immediately visible in v2 with zero intervention.

Modified: copied with a confidence penalty

When a fuzzy match is accepted, _carry_symbol() runs:

Look up the older function’s symbol by its stable_id.
Check whether the newer function’s stable_id already has a symbol. If it does, leave it alone. A pre-existing annotation from a higher-authority source takes precedence.
Write a new symbol for the newer stable_id with:
- The same name, type signature, and summary as the source.
- provenance = "diff-carry" (rank 40 in the provenance economy, below oracle (90) but above agent (30)).
- confidence = old.confidence × CARRY_PENALTY where CARRY_PENALTY = 0.7.

A function named parseToken with confidence 0.92 after the Oracle and human review will arrive in v2 as parseToken with confidence 0.644 and provenance diff-carry. The penalty signals “probably still right, worth a second look.” Agents will not overwrite this (their rank is lower); Oracle re-identification can upgrade it if the function still hits a corpus signature. The evidence field records the carry trail:

{"kind": "carry-over", "detail": "from a3f1b2c4d5e6 score=0.78"}

diff_versions() carries only when carry=True (the default). Pass --no-carry to warden diff to produce a classification report without touching the symbols table. This is useful for a dry-run assessment of what changed.

The semantic changelog

After classification, render_changelog() produces a human-readable report that does two things ordinary binary diff tools cannot: it counts only app-code changes and explains the rest as runtime/toolchain churn. A function is tagged as runtime churn if its name (from the current or previous version) starts with any of a list of known prefixes:

emscripten_  __em_  wasi_  dlmalloc  memcpy  memset  malloc  free
__cxa_  pthread_  stackSave  stackRestore  __wasm_call_ctors  ...

The changelog separates the two buckets so a 300-function change caused by an Emscripten version bump does not bury the 6 genuine application changes you actually need to review.

Sample changelog

# WARDEN changelog: v1 -> v2

- unchanged: 241
- moved:      18
- modified:   47  (6 app, 41 runtime/toolchain churn)
- new:         3
- deleted:     1
- annotations carried forward: 5

## Needs review (genuine app deltas)

  [MODIFIED] parseToken (score 0.78)
  [MODIFIED] verify_license (score 0.81)
  [MODIFIED] crypto_init (score 0.71)
  [MODIFIED] handle_request (score 0.67)
  [MODIFIED] dispatch_message (score 0.74)
  [MODIFIED] build_response (score 0.69)
  [NEW] verifyLicense_v2
  [NEW] audit_log_append
  [NEW] rate_limit_check

The “41 runtime/toolchain churn” line represents functions that matched Emscripten or musl prefixes (for example, an Emscripten 3.1.55→3.1.61 upgrade). These are silently correct and require no human attention.

The `warden diff` command

warden diff <from-label> <to-label> [--no-carry] [--db <path>]

Run this after ingesting both versions. The result is stored in the diffs table as a JSON DiffReport and printed as the semantic changelog.

# Ingest the new version (existing version already in the KB)
warden ingest app_v2.wasm --label v2

# Diff, carry annotations forward, and print the changelog
warden diff v1 v2

# Classification report only, no annotation writes
warden diff v1 v2 --no-carry

After warden diff completes, you can inspect carry-over results directly:

# See which v2 functions have diff-carry provenance
warden funcs v2

# Inspect a specific function
warden show v2 <index>

Check coverage immediately after diffing. A typical update where only a few functions changed will carry coverage from wherever v1 left off to nearly the same number on v2, with zero manual work.

warden coverage v2

Full pipeline: v2 in practice

Ingest the new version

warden ingest app_v2.wasm --glue app_v2.js --label v2

Seeds any names the new binary exposes via exports, imports, or name section.

Diff and carry

warden diff v1 v2

Runs all three passes, writes carried annotations, prints the semantic changelog.

Review only the app deltas

The changelog’s “Needs review” section lists the functions that actually changed in application code. Inspect each:

warden show v2 <index>

If the carried name is still correct, lock it:

warden set-name v2 <index> <name>

Re-run the Oracle and agents on new functions

New and heavily modified functions have no annotation yet. The Oracle may identify runtime additions from a toolchain bump; agents cover the rest.

warden oracle identify v2 --store oracle.json
warden agent v2

Export a deliverable

warden export v2 --format pseudo
warden export v2 --format ghidra --out v2_rename.py

What the `DiffReport` contains

The full report is stored in the diffs table as JSON (the result of DiffReport.as_dict()) and contains every Change record:

{
  "from": "v1",
  "to": "v2",
  "summary": {
    "unchanged": 241,
    "moved": 18,
    "modified": 47,
    "new": 3,
    "deleted": 1,
    "app_modified": 6,
    "runtime_churn": 41,
    "carried_symbols": 5
  },
  "changes": [
    {
      "classification": "modified",
      "from_index": 112,
      "to_index": 114,
      "name": "parseToken",
      "stable_from": "a3f1b2c4d5e6...",
      "stable_to": "f9e8d7c6b5a4...",
      "score": 0.78,
      "review": true,
      "runtime": false,
      "carried_name": "parseToken"
    }
  ]
}

review: true marks non-runtime modified functions. These are exactly the functions that appear in “Needs review” in the changelog. carried_name is non-null when _carry_symbol() wrote a symbol for this pairing.

Provenance economy position of diff-carry

diff-carry sits at rank 40 in the provenance hierarchy: below oracle (90), export (60), and import (55), but above agent (30). In practice this means:

An Oracle re-identification pass on v2 will upgrade a diff-carry annotation if the function still hits a corpus signature at score ≥ 0.82.
An agent pass will not overwrite a diff-carry annotation, regardless of claimed confidence.
A warden set-name call (provenance human) always wins.

This ordering ensures carried knowledge is never silently destroyed by a lower-authority source, while higher-authority passes can still refine or correct it.

Core concepts

Stable identity, the shared fingerprint engine, and the provenance economy: the three ideas behind how carry-over works.

CLI reference

Full flag documentation for warden diff and every other command.

​How it works

​Pass 1: exact-body match

​Pass 2: stable identity match

​Pass 3: greedy fuzzy match

​Classification summary

​Annotation carry-over: identical vs. fuzzy

​Unchanged and moved: zero work

​Modified: copied with a confidence penalty

​The semantic changelog

​Sample changelog

​The warden diff command

​Full pipeline: v2 in practice

​What the DiffReport contains

​Provenance economy position of diff-carry

Core concepts

CLI reference

How it works

Pass 1: exact-body match

Pass 2: stable identity match

Pass 3: greedy fuzzy match

Classification summary

Annotation carry-over: identical vs. fuzzy

Unchanged and moved: zero work

Modified: copied with a confidence penalty

The semantic changelog

Sample changelog

The `warden diff` command

Full pipeline: v2 in practice

What the `DiffReport` contains

Provenance economy position of diff-carry