Eight deterministic synthetic tiers designed to isolate what κ-topology actually contributes to retrieval. T1–T6 test κ-sensitive cycle detection; T7–T8 benchmark graph algorithm quality (evidence paths, causal ordering). Each tier runs topology ON vs OFF side by side.
T1–T2 are acyclic controls. T3–T6 increase in κ complexity and retrieval difficulty. T7–T8 benchmark graph algorithms: weighted shortest paths (Dijkstra) and causal DAG ordering (toposort).
Acyclic control. Direct causal chains A→B→C→D with no loop-back. Topology should stay inert — no SCCs, no κ, no deliberate routing.
Acyclic control. Tree-shaped causal decomposition: root→{A,B}→{A1,A2,B1}. Tests that fan-out alone doesn't trigger false-positive κ routing.
Disjoint 3–5 node SCCs forming directed cycles A→B→C→A. Primary test for cycle-root paradox and membership-in-cycle retrieval.
Dense bidirectional ring + chord edges per SCC. Tests fault-line detection and high-κ routing. Configurable chord density scales κ upward.
Paired old/new facts per subject, half joined by
bidirectional contradicts (κ=1), half by
one-way supersedes (κ=0). Tests belief
revision under κ pressure.
SCCs planted across a density ladder [1..5] — retriever must distinguish κ values, not just detect that a cycle exists. New metrics: discrim_accuracy and κ-MAE.
Weighted DAGs with known shortest paths (Dijkstra). Tests whether retrieval preserves evidence chain ordering and hop counts. Alternate paths probe multi-path awareness.
Layered DAGs with known topological order and critical path depth. Tests causal sequencing (toposort), longest-path detection, and source/sink identification.
Every tier passes the ≥3pp κ-metric gate (or — for the κ=0
controls — correctly produces zero delta). Sanity
mode, deterministic from --seed 42.
| Tier | κ class | kappa_recall on | kappa_recall off | Δ | max κ on / off | p50 on / off | Gate |
|---|---|---|---|---|---|---|---|
| T1 | κ=0 linear | 0.00 | 0.00 | 0 | 0 / 0 | 30ms / 32ms | ✓ correct |
| T2 | κ=0 branching | 0.00 | 0.00 | 0 | 0 / 0 | 30ms / 29ms | ✓ correct |
| T3 | κ=1 cycle | 1.00 | 0.00 | +100pp | 1 / 0 | 10.2s / 32ms | ✓ pass |
| T4 | κ≥2 multi-SCC | 1.00 | 0.00 | +100pp | 2 / 0 | 43ms / 32ms | ✓ pass |
| T5 | κ=0–1 contradict | 1.00 | 0.00 | +100pp | 1 / 0 | 41ms / 39ms | ✓ pass |
| T6 | mixed κ | 1.00 | 0.00 | +100pp | 2 / 0 | 46ms / 38ms | ✓ pass |
These tiers are κ-independent — they benchmark whether the retriever preserves structural properties of the ingested graph. Topology on/off should not affect these metrics (and doesn't).
| Metric | Topology ON | Topology OFF | Δ |
|---|---|---|---|
| path_node_recall | 1.00 | 1.00 | 0 |
| path_order_accuracy | 0.60 | 0.60 | 0 |
| hop_count_mae | 4.00 | 4.00 | 0 |
| alt_path_detected | 1.00 | 1.00 | 0 |
| latency p50 / p95 | 32ms / 200ms | 35ms / 275ms | ~0 |
trace_evidence_path MCP tool now uses
Graphonomous.Algorithms.Dijkstra with Yen's
K-shortest paths, pluggable cost function
(-log(confidence) + recency_decay + type_cost),
and binary-heap priority queue. 22 unit tests passing.
These baseline numbers measure retriever-only path
awareness; the Dijkstra tool itself achieves MAE ≈ 0
on direct graph queries.
| Metric | Topology ON | Topology OFF | Δ |
|---|---|---|---|
| ordering_accuracy | 0.46 | 0.46 | 0 |
| critical_depth_mae | 2.67 | 2.67 | 0 |
| source_sink_recall | 1.00 | 1.00 | 0 |
| latency p50 / p95 | 34ms / 205ms | 33ms / 201ms | ~0 |
Graphonomous.Algorithms.DAG provides Kahn's
toposort O(V+E), longest-path DP, critical_path_depth,
and source/sink identification. 22 unit tests passing.
These baseline numbers measure retriever-only causal
awareness; the DAG module itself achieves exact ordering
on well-formed DAGs. Ordering accuracy of 0.46 reflects
that similarity-based retrieval doesn't respect layer
ordering — an expected gap that the DAG tool closes
when invoked directly.
All results: sanity mode, --seed 42, fallback
embedder. T7/T8 are κ-independent — topology on/off produces
identical algorithm metrics by design.
Beyond T7/T8 baselines, Graphonomous v0.3.3 ships a full
graph algorithms library in
graphonomous/lib/graphonomous/algorithms/.
All algorithms are pure functions callable independently
or via MCP tools.
| Algorithm | Module | Complexity | Tests | Portfolio Reuse |
|---|---|---|---|---|
| Weighted Dijkstra + Yen's K-shortest | dijkstra.ex |
O((V+E) log V) | 22 | Delegatic, Deliberatic, GeoFleetic, AgenTroMatic |
| Kahn's toposort + longest-path DP | dag.ex |
O(V+E) | 22 | AgenTroMatic, SpecPrompt, OS-008 |
| Hopcroft-Karp + Hungarian | matching.ex |
O(E√V) / O(n³) | 12 | FleetPrompt, GeoFleetic, AgenTroMatic |
| Louvain community detection | louvain.ex |
O(n log n) | 10 | Consolidation, forget_by_policy |
| Incremental SCC maintenance | incremental_scc.ex |
O(m½) amortized | 13 | Topology (replaces cold Tarjan) |
| Triangle counting + clustering | triangles.ex |
O(m1.5) | 15 | Graph health instrumentation |
| Total | 72 | Codebase total: 455 tests | ||
algorithms/ppr.ex) and tested
at two weight settings on LongMemEval. Net effect was
slightly negative (−0.3pp to −0.6pp QA) because
LongMemEval queries are session-scoped and random walk
doesn't discover new relevant nodes. PPR is flag-gated off
by default but available for denser multi-hop workloads.
The point of a synthetic benchmark is to control the difficulty axis. These five CLI flags let you push Graphonomous along orthogonal pressure points and watch where performance degrades.
T4 chord-edge density. 1 = bidirectional
ring only, 5 = dense layered chords. Higher
density increases κ and forces fault-line detection to
work harder.
Injects N acyclic chains of noise alongside the gold SCCs. Beyond some threshold the similarity stage drops gold clusters from its top-K, collapsing kappa_recall.
Shrink the retrieval budget that topology gets to work
with. At similarity-limit=3, hops=0, T4
kappa_recall drops to 0.90 and faultline_mrr falls to
0.60.
Discrimination, not detection. SCCs planted across a density ladder force the retriever to resolve the exact κ value. Tracks kappa_discrim_accuracy and MAE.
Strips distinctive domain words and replaces them with
system-N tokens. Similarity retrieval can
no longer shortcut to the right cluster — topology has
to carry the weight.
All tiers are deterministic and reproduce bit-for-bit from
the same seed. Fixtures live in
priv/graphmembench/fixtures/T{1..8}/.
retriever.ex, topology.ex, or the
LongMemEval path. Running it cannot affect production behavior.
Results land in
benchmark_results/graphmembench_T{N}_topology_{on,off}.json
for offline comparison.