Python library reference

The CLI is a thin wrapper over a pure-Python library. Every operation you can do from the command line is available as a function or method you can call directly. The core library has no non-stdlib dependencies. The only runtime requirements are sqlite3, json, hashlib, and struct (the [mcp], [openai], [anthropic], and [agents] extras add optional layers on top).

WARDEN is alpha. The public API described here is stable enough to build on, but a few higher-level helpers (thread-model analysis, struct inference) are scaffolded and may change shape before 1.0.

Package layout

warden/
  project.py          ingest_into_kb, load_module, sha256_file, IngestResult
  kb/
    database.py       KnowledgeBase  (the SQLite spine)
    models.py         Symbol, ModuleVersion, CoverageStats
  ingest/
    __init__.py       parse_file, parse_module, parse_glue_file, Module, Function, GlueInfo …
  identity/
    __init__.py       fingerprint_function, fingerprint_from_record, similarity, minhash_jaccard …
  diff/
    __init__.py       diff_versions, render_changelog, DiffReport, Change
  lift/
    __init__.py       lift_function, lift_module
  interp/
    __init__.py       execute_function, differential_execute, UnsupportedExecution
  analysis/
    concurrency.py    analyze_concurrency, ConcurrencyReport, AtomicSite
    structs.py        analyze_structs, StructLayout, StructField
    callgraph.py      build_call_graph, layered_schedule, strongly_connected_components, CallGraph
  report/
    __init__.py       render_report, write_report
  oracle/
    index.py          SignatureIndex, identify_indexed

Opening a knowledge base

KnowledgeBase is a context manager. It opens (or creates) a project database, applies the schema if it does not exist, and commits on __exit__.

from warden.kb import KnowledgeBase

# Open an existing project, or create a new one.
with KnowledgeBase("warden.db") as kb:
    versions = kb.versions()
    for v in versions:
        print(v.id, v.label, v.num_functions)

You can also manage the lifecycle manually:

kb = KnowledgeBase("warden.db")
try:
    # ... your work ...
    kb.commit()
finally:
    kb.close()

KnowledgeBase.__init__ accepts a str or pathlib.Path. The database file is created if it does not exist; PRAGMA foreign_keys = ON is always set.

Ingesting a module version

ingest_into_kb is the workhorse. It parses the .wasm, fingerprints every function, records a per-version appearance log, and seeds the durable symbol layer from free facts (name section, exports, imports), giving fresh modules partial coverage before the Oracle or agents run.

from pathlib import Path
from warden.kb import KnowledgeBase
from warden.project import ingest_into_kb

with KnowledgeBase("warden.db") as kb:
    result = ingest_into_kb(
        kb,
        wasm_path=Path("app_v1.wasm"),
        label="v1",
        glue_path=Path("app_v1.js"),   # optional; recovers Emscripten version & dynCall sigs
        notes="production build, -O2",
    )

print(result.version_id)        # int row id for this version
print(result.num_functions)     # total functions (imported + defined)
print(result.num_defined)       # defined (non-import) functions
print(result.seeded_symbols)    # how many names were seeded from free facts
print(result.emscripten_version)  # e.g. "3.1.55", or None if no glue
print(result.shared_memory)     # bool: whether the module uses shared memory

ingest_into_kb signature:

def ingest_into_kb(
    kb: KnowledgeBase,
    wasm_path: str | Path,
    *,
    label: str,
    glue_path: str | Path | None = None,
    notes: str | None = None,
) -> IngestResult: ...

The returned IngestResult is a plain dataclass. You can safely pass it to dataclasses.asdict() or log it.

If you only need the Module object (for ad-hoc analysis, without touching a database), use load_module from warden.project, or parse_file from warden.ingest directly.

from warden.project import load_module
from warden.ingest import parse_file

module = load_module("app_v1.wasm")   # convenience re-export
module = parse_file("app_v1.wasm")    # same thing
print(module.num_imported_funcs)
print(len(module.defined_functions))

Querying the knowledge base

Module versions

from warden.kb import KnowledgeBase

with KnowledgeBase("warden.db") as kb:
    # All ingested versions, ordered by id.
    versions = kb.versions()

    # By label.
    v1 = kb.get_version("v1")          # ModuleVersion | None
    latest = kb.latest_version()       # ModuleVersion | None

    # Paths recorded at ingest time.
    wasm_path, glue_path = kb.version_paths(v1.id)

ModuleVersion fields: id, label, wasm_sha256, emscripten_version, num_functions, num_imported, shared_memory, ingested_at, notes.

Functions

with KnowledgeBase("warden.db") as kb:
    v1 = kb.get_version("v1")

    # All function rows for a version (returns list[dict]).
    funcs = kb.functions_for_version(v1.id, include_imports=True)

    # A single function by version + func_index.
    row = kb.get_function(v1.id, func_index=42)
    if row:
        print(row["stable_id"])
        print(row["type_signature"])
        print(row["instruction_count"])
        print(row["is_import"])

Each function dict contains: id, version_id, func_index, stable_id, exact_hash, structural_hash, minhash (list of ints), histogram (dict), call_targets (list of strings), local_calls, type_signature, instruction_count, body_size, is_import, raw_name.

Symbols

from warden.kb import KnowledgeBase
from warden.kb.models import Symbol

with KnowledgeBase("warden.db") as kb:
    # Look up by stable_id.
    sym = kb.get_symbol("a3f1b2...", kind="function")   # Symbol | None
    if sym:
        print(sym.name, sym.provenance, sym.confidence)
        print(sym.locked)      # bool: human-locked symbols are immutable to automation

    # Bulk lookup.
    stable_ids = [f["stable_id"] for f in kb.functions_for_version(v1.id)]
    symbols = kb.symbols_for_stable_ids(stable_ids)   # dict[stable_id, Symbol]

    # Coverage breakdown for a version.
    stats = kb.coverage(v1.id)
    print(f"{stats.named}/{stats.defined} defined functions named ({stats.coverage_pct}%)")
    print(f"  oracle={stats.oracle_named}  human={stats.human_named}  agent={stats.agent_named}")

Writing symbols

Always write through upsert_symbol. This is what enforces the provenance/confidence economy described in core concepts. Direct INSERT into the symbols table bypasses the economy and risks clobbering verified work.

from warden.kb.models import Symbol

sym = Symbol(
    stable_id="a3f1b2c4d5e6f7a8b9c0d1e2f3a4b5c6",
    name="process_audio_frame",
    type_signature="(i32, i32, f32) -> i32",
    summary="Processes one audio frame; returns bytes consumed.",
    provenance="human",
    confidence=1.0,
    evidence=[{"kind": "manual", "detail": "confirmed via string xref + call graph"}],
)

with KnowledgeBase("warden.db") as kb:
    written, reason = kb.upsert_symbol(sym, actor="human")
    print(written, reason)   # True, "human override"  (or False if already locked higher)

    # Lock a symbol so no automated source can overwrite it.
    kb.lock_symbol("a3f1b2c4d5e6f7a8b9c0d1e2f3a4b5c6", kind="function")

upsert_symbol returns (bool, str). The first value is True if the write was accepted, or False if rejected. A rejection is not an error; it means a higher-authority annotation already exists. Symbol fields:

Field	Type	Default	Notes
`stable_id`	`str`	(required)	Content identity from `fingerprint_function`.
`kind`	`str`	`"function"`	Currently always `"function"`.
`name`	`str \| None`	`None`	Human-readable function name.
`type_signature`	`str \| None`	`None`	Wasm type, e.g. `"(i32, i32) -> i32"`.
`summary`	`str \| None`	`None`	One-line description.
`provenance`	`str`	`"agent"`	See provenance rank table in concepts.
`confidence`	`float`	`0.0`	0.0–1.0.
`evidence`	`list[dict]`	`[]`	Structured evidence items.
`locked`	`bool`	`False`	If `True`, only a human write can change it.

Diffing two versions

diff_versions matches functions between two ingested versions, classifies each as unchanged, moved, modified, new, or deleted, and (by default) carries annotations forward via fuzzy match with a confidence penalty.

from warden.kb import KnowledgeBase
from warden.diff import diff_versions, render_changelog

with KnowledgeBase("warden.db") as kb:
    v1 = kb.get_version("v1")
    v2 = kb.get_version("v2")

    report = diff_versions(
        kb,
        from_version_id=v1.id,
        to_version_id=v2.id,
        carry=True,          # port annotations forward (default); False = classify only
    )

    # Human-readable changelog.
    print(render_changelog(report))

    # Counts.
    s = report.summary()
    print(s["unchanged"], s["moved"], s["modified"], s["new"], s["deleted"])
    print("app deltas:", s["app_modified"], "  toolchain churn:", s["runtime_churn"])
    print("annotations carried:", s["carried_symbols"])

    # Individual change records.
    for change in report.changes:
        if change.review:   # review=True → genuine app delta, not runtime churn
            print(change.classification, change.name, f"score={change.score:.2f}")

diff_versions also calls kb.store_diff so the result is cached and retrievable with kb.get_diff(from_version_id, to_version_id). DiffReport fields: from_label, to_label, changes: list[Change], carried_symbols: int. Change fields:

Field	Type	Notes
`classification`	`str`	`unchanged` / `moved` / `modified` / `new` / `deleted`
`from_index`	`int \| None`	Function index in the from-version
`to_index`	`int \| None`	Function index in the to-version
`name`	`str \| None`	Best available name
`stable_from`	`str \| None`	Stable identity in the from-version
`stable_to`	`str \| None`	Stable identity in the to-version
`score`	`float`	Similarity score (0.0–1.0)
`review`	`bool`	`True` = genuine app delta; `False` = toolchain churn or unchanged
`runtime`	`bool`	Heuristically identified as runtime/libc
`carried_name`	`str \| None`	Name ported forward, if any

render_changelog returns a plain markdown string. Pipe it to a file or display it in a TUI.

Fingerprinting functions

fingerprint_function produces the four fingerprints + stable identity for a single function. You usually call ingest_into_kb and let it handle this, but you can also call it directly for ad-hoc analysis or to build custom corpora.

from warden.ingest import parse_file
from warden.identity import fingerprint_function, similarity, minhash_jaccard

module = parse_file("app_v1.wasm")

# Fingerprint a single function.
func = module.functions[42]          # Function object (includes imports)
fp = fingerprint_function(module, func)

print(fp.stable_id)           # 32-character hex; the content identity
print(fp.exact_hash)          # sha256 of the raw body
print(fp.structural_hash)     # blake2b of the control-flow/call skeleton
print(fp.type_signature)      # e.g. "(i32, i32) -> i32"
print(fp.instruction_count)
print(fp.call_targets)        # tuple of "module.field" import calls
print(fp.local_calls)         # count of calls to locally-defined functions
print(fp.minhash)             # tuple of 32 ints (MinHash signature)
print(fp.opcode_histogram)    # dict[str, int]: counts per opcode class

Comparing two functions

from warden.identity import fingerprint_function, similarity

module_a = parse_file("app_v1.wasm")
module_b = parse_file("app_v2.wasm")

fp_a = fingerprint_function(module_a, module_a.functions[42])
fp_b = fingerprint_function(module_b, module_b.functions[45])

score = similarity(fp_a, fp_b)

print(score.overall)           # 0.0–1.0 composite score
print(score.exact)             # bool: byte-identical bodies
print(score.structural)        # bool: same control-flow skeleton
print(score.fuzzy)             # MinHash Jaccard estimate
print(score.histogram)         # opcode-histogram cosine similarity
print(score.call_overlap)      # import call-neighborhood Jaccard
print(score.classification)    # "identical" / "near-identical" / "strong" / "weak" / "none"

Low-level: MinHash Jaccard only

If you already have two MinHash signatures (e.g. reconstructed from KB rows via fingerprint_from_record), you can call minhash_jaccard directly:

from warden.identity import fingerprint_from_record, minhash_jaccard

row_a = kb.get_function(v1.id, 42)
row_b = kb.get_function(v2.id, 45)

fp_a = fingerprint_from_record(row_a)
fp_b = fingerprint_from_record(row_b)

jaccard = minhash_jaccard(fp_a.minhash, fp_b.minhash)

fingerprint_from_record reconstructs a full FunctionFingerprint from a stored KB function dict. This lets you run the similarity engine against already-ingested functions without re-parsing the original .wasm.

Parsing a module directly

For scripting that only needs the parsed model (not the KB), import from warden.ingest:

from warden.ingest import parse_file, parse_glue_file, WasmParseError

try:
    module = parse_file("app_v1.wasm")
except WasmParseError as e:
    print(f"parse failed: {e}")

# All functions (imported + defined), in index order.
for func in module.functions:
    print(func.index, func.is_import, module.function_name(func))

# Defined functions only.
for func in module.defined_functions:
    sig = module.func_type(func)
    if sig:
        print(func.index, sig.signature())

# Metadata.
print(module.num_imported_funcs)
print(module.shared_memory())     # bool

# Optional JS glue.
glue = parse_glue_file("app_v1.js")
print(glue.emscripten_version)
print(glue.notes)                 # list of diagnostic strings

End-to-end example

A complete two-version ingest, diff, and symbol write in one script:

from pathlib import Path
from warden.kb import KnowledgeBase
from warden.kb.models import Symbol
from warden.project import ingest_into_kb
from warden.diff import diff_versions, render_changelog

DB = Path("my_project.warden.db")

with KnowledgeBase(DB) as kb:
    # Ingest version 1.
    r1 = ingest_into_kb(kb, "app_v1.wasm", label="v1", glue_path="app_v1.js")
    print(f"v1: {r1.num_functions} funcs, {r1.seeded_symbols} names seeded")

    # Ingest version 2.
    r2 = ingest_into_kb(kb, "app_v2.wasm", label="v2")
    print(f"v2: {r2.num_functions} funcs")

    # Annotate one function you identified manually.
    stable_id = kb.get_function(r1.version_id, 42)["stable_id"]
    sym = Symbol(
        stable_id=stable_id,
        name="encode_frame",
        type_signature="(i32, i32, i32) -> i32",
        summary="Encodes one video frame; returns bytes written.",
        provenance="human",
        confidence=1.0,
    )
    written, reason = kb.upsert_symbol(sym, actor="human")
    print(f"symbol write: {written}, {reason}")

    # Diff v1 -> v2; carries the annotation forward automatically.
    report = diff_versions(kb, r1.version_id, r2.version_id)
    print(render_changelog(report))

    # Show only the genuine app deltas.
    for change in report.changes:
        if change.review:
            print(change.classification, change.name or f"func[{change.to_index}]")

Audit log

Every symbol write (accepted or rejected) is recorded. You can read the last n entries:

with KnowledgeBase("warden.db") as kb:
    for entry in kb.audit_log(limit=20):
        print(entry["action"], entry["actor"], entry["stable_id"], entry["detail"])

Actions are created, updated, or rejected.

Key constants

Name	Location	Value	Meaning
`MODIFIED_THRESHOLD`	`warden.diff`	`0.6`	Minimum similarity score for a fuzzy “modified” match.
`CARRY_PENALTY`	`warden.diff`	`0.7`	Confidence multiplier applied to carried annotations.
`SCHEMA_VERSION`	`warden.kb.database`	`"1"`	Current DB schema version; stored in the `meta` table.

Built-in lifter (`warden.lift`)

lift_function and lift_module turn a parsed Module into readable pseudo-C with no external toolchain. The lifter is a symbolic stack evaluator: it walks func.instructions, folds the stack-machine operations back into infix expressions, and renders a deterministic text output that can be diffed across binary versions. Unmodeled opcodes degrade to /* mnemonic */ comments instead of raising, so every function always lifts to something.

from warden.ingest import parse_file
from warden.lift import lift_function, lift_module

module = parse_file("app_v1.wasm")

# Lift a single defined function to pseudo-C.
func = module.defined_functions[0]
print(lift_function(module, func))
# e.g.  i32 parse_token(i32 p0, i32 p1) {
#           return ((p0 + p1) * 7);
#       }

# Lift every defined function, concatenated (imports are skipped).
pseudocode = lift_module(module)

Signatures:

def lift_function(module: Module, func: Function) -> str: ...
def lift_module(module: Module) -> str: ...

lift_function is also the backend for warden export --format pseudo: pseudocode exports now emit real pseudo-C rather than a mnemonic dump.

Mini interpreter (`warden.interp`)

execute_function runs a function body on concrete integer inputs using a pure-Python, zero-dependency interpreter for the i32 subset. differential_execute runs two functions over the same inputs and reports per-input agreement, making behavioral equivalence concrete and runnable.

from warden.ingest import parse_file
from warden.interp import execute_function, differential_execute, UnsupportedExecution

module = parse_file("app_v1.wasm")
func = module.defined_functions[0]

# Execute a single function.
try:
    result = execute_function(module, func, [10, 20])
    print(result)   # list of i32 return values, e.g. [210]
except UnsupportedExecution as e:
    print(f"cannot execute: {e}")

# Differential execution across two versions.
mod_v1 = parse_file("app_v1.wasm")
mod_v2 = parse_file("app_v2.wasm")
fn_v1 = mod_v1.defined_functions[0]
fn_v2 = mod_v2.defined_functions[0]

rows = differential_execute(mod_v1, fn_v1, mod_v2, fn_v2, inputs=[[0, 1], [5, 3], [100, 7]])
for row in rows:
    print(row["args"], row["a"], row["b"], row["match"])
# {"args": [5, 3], "a": [56], "b": [56], "match": True}

Signatures:

def execute_function(
    module: Module,
    func: Function,
    args: list[int],
    *,
    host: Callable[[str, list[int]], list[int]] | None = None,
    memory: bytearray | None = None,
    fuel: int = 100000,
) -> list[int]: ...

def differential_execute(
    mod_a: Module,
    fn_a: Function,
    mod_b: Module,
    fn_b: Function,
    inputs: list[list[int]],
) -> list[dict]: ...

Each differential_execute row contains {"args", "a", "b", "match"} where a/b are the result stacks (or None if that side raised UnsupportedExecution), and match is whether both sides produced identical results. UnsupportedExecution is raised (or recorded as None) whenever an opcode or construct falls outside the modeled integer subset. Callers treat it as “cannot decide” rather than an error.

The interpreter models the i32 integer instruction set, structured control flow (block/loop/if), memory loads and stores, and direct calls. Floating-point, SIMD, multi-value blocks, and indirect calls raise UnsupportedExecution.

Specialized analyzers (`warden.analysis`)

Concurrency (`warden.analysis.concurrency`)

analyze_concurrency recovers the thread model in one deterministic pass: it detects shared memory, locates every atomic-class instruction, and collects pthread-ish import/export names. When a KnowledgeBase and version_id are supplied, each atomic site is persisted as a kind='atomic' thread fact.

from warden.ingest import parse_file
from warden.kb import KnowledgeBase
from warden.analysis.concurrency import analyze_concurrency

module = parse_file("app_v1.wasm")

# Without persisting to the KB.
report = analyze_concurrency(module)
print(report.shared_memory)      # bool
print(report.is_threaded)        # bool: shared memory OR atomics OR pthread markers
print(report.atomic_sites)       # list[AtomicSite]
print(report.pthread_markers)    # list[str]: sorted, deduplicated

# Persisting atomic facts to the KB.
with KnowledgeBase("warden.db") as kb:
    version = kb.get_version("v1")
    report = analyze_concurrency(module, kb=kb, version_id=version.id)
    print(f"{len(report.atomic_sites)} atomic sites persisted")

Signature:

def analyze_concurrency(
    module: Module,
    kb: KnowledgeBase | None = None,
    version_id: int | None = None,
) -> ConcurrencyReport: ...

ConcurrencyReport fields: shared_memory (bool), atomic_sites (list[AtomicSite]), pthread_markers (list[str]), facts (list[dict]), is_threaded (property). AtomicSite fields: func_index, func_name, mnemonic, offset, instr_offset, site (property).

Struct layout (`warden.analysis.structs`)

analyze_structs recovers candidate struct layouts from memory-access patterns. For each defined function it groups fixed-displacement i32 loads and stores by base pointer local; each base with at least one access yields a StructLayout. When a KnowledgeBase and version_id are supplied, every layout is persisted via kb.upsert_struct at provenance="agent", confidence=0.5.

from warden.ingest import parse_file
from warden.kb import KnowledgeBase
from warden.analysis.structs import analyze_structs

module = parse_file("app_v1.wasm")

# Without persisting to the KB.
layouts = analyze_structs(module)
for layout in layouts:
    print(layout.name, layout.source_function)
    for field in layout.fields:
        print(f"  +{field.offset:#x}  {field.type}  {field.name}")

# Persisting recovered shapes to the KB.
with KnowledgeBase("warden.db") as kb:
    version = kb.get_version("v1")
    layouts = analyze_structs(module, kb=kb, version_id=version.id)
    print(f"{len(layouts)} struct candidates persisted")

Signature:

def analyze_structs(
    module: Module,
    kb: KnowledgeBase | None = None,
    version_id: int | None = None,
) -> list[StructLayout]: ...

StructLayout fields: name (str), fields (list[StructField]), source_function (str | None). StructField fields: offset (int), size (int), type (str), name (str).

Run both analyzers together with warden analyze <label> from the CLI, which calls them in sequence and persists facts to the KB automatically.

Call graph (`warden.analysis.callgraph`)

build_call_graph extracts the intra-module call graph from a parsed module. Direct calls (call / return_call) are exact. Indirect calls (call_indirect / return_call_indirect and dynCall wrappers) are over-approximated to every table target whose type matches the call-site type index, since the static instruction only carries a type, not a concrete target. The result is a conservative skeleton that is always safe to schedule from. layered_schedule condenses strongly-connected components (mutual recursion) via strongly_connected_components, then assigns a bottom-up depth to each component: layer 0 contains leaves (functions that call no other defined function), and each later layer’s functions have all their defined callees in earlier layers. Members of the same SCC share one layer. Functions within a layer are independent. strongly_connected_components runs iterative Tarjan SCC on any node/successor pair you supply. It returns components in reverse-topological order (callees before callers), which is exactly the order the bottom-up schedule needs.

from warden.ingest import parse_file
from warden.analysis.callgraph import build_call_graph, layered_schedule

module = parse_file("app_v1.wasm")

# Build the call graph.
cg = build_call_graph(module)
print(cg.edges)           # dict[int, set[int]]: caller index -> defined callee indices
print(cg.imports_called)  # dict[int, set[int]]: caller index -> imported function indices
print(cg.indirect_callers) # set[int]: callers that use call_indirect / dynCall
print(cg.table_targets)   # set[int]: defined functions reachable through the table

# Direct callees of function 42.
print(cg.callees(42))     # set[int]

# Layered bottom-up schedule (pass graph= to avoid rebuilding).
layers = layered_schedule(module, graph=cg)
for depth, layer in enumerate(layers):
    print(f"layer {depth}: {layer}")
# layer 0: [5, 12, 31]   <- leaves
# layer 1: [7, 20]
# layer 2: [3]           <- top-level callers

Signatures:

def build_call_graph(module: Module) -> CallGraph: ...

@dataclass
class CallGraph:
    edges: dict[int, set[int]]       # caller index -> defined callee indices
    imports_called: dict[int, set[int]]  # caller index -> imported function indices
    indirect_callers: set[int]       # callers that use call_indirect / dynCall
    table_targets: set[int]          # defined functions reachable through the table

    def callees(self, index: int) -> set[int]: ...

def layered_schedule(
    module: Module,
    graph: CallGraph | None = None,
) -> list[list[int]]: ...

def strongly_connected_components(
    nodes: list[int],
    succ,          # callable: node -> iterable[node]
) -> list[list[int]]: ...

layered_schedule accepts an already-built CallGraph via graph=. When omitted it calls build_call_graph internally. The returned list is sorted at every level for determinism.

Static HTML report generator (`warden.report`)

render_report produces a single, self-contained HTML file (inline CSS, no external assets) from a KB version. The report contains a coverage summary, a confidence heatmap of functions colored by provenance and confidence, a thread/memory model section, and the diff changelog from the nearest prior version. The output is deterministic: the same KB state always produces byte-identical HTML, so reports diff cleanly alongside the binary artifacts they document.

from pathlib import Path
from warden.kb import KnowledgeBase
from warden.report import render_report, write_report

with KnowledgeBase("warden.db") as kb:
    version = kb.get_version("v2")

    # Get the HTML as a string.
    html = render_report(kb, version.id)

    # Or write directly to a file.
    write_report(kb, version.id, Path("report_v2.html"))

Signatures:

def render_report(
    kb: KnowledgeBase,
    version_id: int,
    module: Module | None = None,
) -> str: ...

def write_report(
    kb: KnowledgeBase,
    version_id: int,
    path: str | Path,
    module: Module | None = None,
) -> None: ...

The module parameter is optional. When omitted, the report is driven entirely by the KB and works even without the original .wasm on hand. write_report writes UTF-8.

Oracle LSH index (`warden.oracle.index`)

SignatureIndex adds a banded-MinHash + structural-hash index over a SignatureStore for sublinear candidate lookup. identify_indexed is a drop-in replacement for identify that queries the index rather than scoring every signature. The results are identical to a full scan because any function with no index candidates falls back to scoring everything.

from warden.kb import KnowledgeBase
from warden.oracle import load_seed_store, SignatureIndex, identify_indexed

store = load_seed_store()

# Build the index once per store.
index = SignatureIndex.build(store, bands=8)

# Query candidates for a fingerprint (for custom matching pipelines).
from warden.identity import fingerprint_function
from warden.ingest import parse_file

module = parse_file("app_v1.wasm")
func = module.defined_functions[0]
from warden.identity import fingerprint_function
fp = fingerprint_function(module, func)
candidates = index.candidates(fp)   # list[Signature]: the shortlist to score

# Or run the full indexed identify pass against the KB.
with KnowledgeBase("warden.db") as kb:
    version = kb.get_version("v1")
    matches = identify_indexed(
        kb,
        version.id,
        store,
        threshold=0.82,   # default
        write=True,       # persist oracle symbols to KB (default)
    )
    print(f"{len(matches)} oracle matches")

SignatureIndex.build signature:

@classmethod
def build(cls, store: SignatureStore, *, bands: int = 8) -> SignatureIndex: ...

identify_indexed signature:

def identify_indexed(
    kb: KnowledgeBase,
    version_id: int,
    store: SignatureStore,
    *,
    threshold: float = 0.82,
    write: bool = True,
) -> list[OracleMatch]: ...

The --indexed flag in warden oracle identify <label> --store <path> --indexed calls this function. Pass write=False to run a dry-run without touching the KB.

Agent crew (`warden.agents.crew`)

run_agent_pass drives one propose-verify-write-back sweep over all functions in a version. It accepts two keyword arguments that control scheduling and parallelism.

from warden.kb import KnowledgeBase
from warden.project import load_module
from warden.agents.crew import run_agent_pass

module = load_module("app_v1.wasm")

with KnowledgeBase("warden.db") as kb:
    version = kb.get_version("v1")

    # Call-graph strategy (default): bottom-up, concurrent within each layer.
    result = run_agent_pass(kb, module, version.id, strategy="call-graph", concurrency=8)

    # Flat strategy: original single-pass, leaves-first ordering.
    result = run_agent_pass(kb, module, version.id, strategy="flat")

    print(result.backend)
    print(result.proposed, result.written, result.rejected_by_economy, result.skipped_existing)

Signature:

def run_agent_pass(
    kb: KnowledgeBase,
    module: Module,
    version_id: int,
    *,
    backend: AgentBackend | None = None,
    prefer: str | None = None,
    only_unconfident: bool = True,
    strategy: str = "call-graph",
    concurrency: int = 8,
) -> AgentRunResult: ...

strategy accepts "call-graph" (also accepted as "callgraph") or "flat". The call-graph strategy works in five steps:

Build the intra-module call graph with build_call_graph. Direct calls are exact. Indirect calls are over-approximated to table targets of the matching type.
Condense strongly-connected components (mutual recursion) and produce bottom-up layers with layered_schedule. Every function in a layer has all of its defined callees in earlier layers.
Run the concurrency and struct analyzers and route their findings into per-function notes (notes field on FunctionFacts). Atomic sites become synchronization hints. Struct accesses become field-layout hints.
Process layers bottom-up. Each function’s FunctionFacts is enriched with callee_names (the recovered names of its defined callees) before the backend proposes a name. Naming a caller with its callees’ meanings already established is the main quality gain over flat ordering.
Functions in the same layer are independent. They are proposed concurrently in-process using asyncio. Blocking LLM backends run in worker threads, capped by concurrency. Writes still go through the provenance/confidence economy, so concurrent branches that share a callee cannot overwrite each other.

The flat strategy preserves the original single-pass ordering and runs serially. It is still available for backends where concurrency is undesirable. The CLI passes --strategy directly: warden agent <label> --strategy call-graph.

`FunctionFacts`

FunctionFacts is the dataclass handed to every backend. The call-graph strategy populates two fields that are empty under flat:

Field	Type	Notes
`callee_names`	`list[str]`	Recovered names of defined callees, in callee-index order. Populated bottom-up after each callee is named.
`notes`	`list[str]`	Hints from the specialized analyzers: atomic-site descriptions and struct-field summaries.

​Package layout

​Opening a knowledge base

​Ingesting a module version

​Querying the knowledge base

​Module versions

​Functions

​Symbols

​Writing symbols

​Diffing two versions

​Fingerprinting functions

​Comparing two functions

​Low-level: MinHash Jaccard only

​Parsing a module directly

​End-to-end example

​Audit log

​Key constants

​Built-in lifter (warden.lift)

​Mini interpreter (warden.interp)

​Specialized analyzers (warden.analysis)

​Concurrency (warden.analysis.concurrency)

​Struct layout (warden.analysis.structs)

​Call graph (warden.analysis.callgraph)

​Static HTML report generator (warden.report)

​Oracle LSH index (warden.oracle.index)

​Agent crew (warden.agents.crew)

​FunctionFacts

Package layout

Opening a knowledge base

Ingesting a module version

Querying the knowledge base

Module versions

Functions

Symbols

Writing symbols

Diffing two versions

Fingerprinting functions

Comparing two functions

Low-level: MinHash Jaccard only

Parsing a module directly

End-to-end example

Audit log

Key constants

Built-in lifter (`warden.lift`)

Mini interpreter (`warden.interp`)

Specialized analyzers (`warden.analysis`)

Concurrency (`warden.analysis.concurrency`)

Struct layout (`warden.analysis.structs`)

Call graph (`warden.analysis.callgraph`)

Static HTML report generator (`warden.report`)

Oracle LSH index (`warden.oracle.index`)

Agent crew (`warden.agents.crew`)

`FunctionFacts`