orionfold-cortex
A local memory layer that gates its own recall
Positioning
A raw pgvector table drifts silently and carries no trust card on its chunks — you cannot tell a Spark-measured fact from an external claim, and a re-index can quietly drop recall with no alarm. Orionfold Cortex wraps the index with stamped provenance, a coverage report, and a recall@k promotion gate, all dispatched and watched through the Arena control plane. The machine manages its own memory.
- Re-index a multi-source corpus (article · lineage · eval · scout · deep_research) with provenance stamped per chunk, dispatched from the cockpit
- Score chunk-recall@k + slug-recall@k against an in-repo gold set, gated like-for-like against the prior index so a rebuild can't silently regress recall
- Query the Second Brain with a provenance/trust-tier filter — cited hits a hosted RAG can't honestly attribute
Audience. DGX Spark operators running a private, local-first RAG recall layer they drive, not a SaaS.
Measured
qa-eval.jsonl · 44 held-out Q · chunk-recall@5 / slug-recall@5 (cosine-only, GB10)
- chunk-recall@5
- 0.4091
- slug-recall@5
- 0.7273
Lane variants
The harness profile records every serving lane that was driven through the same Hermes agent on the same DGX Spark. Throughput is recorded as measured tokens-per-second on the box.
Recommended lane. cosine-only · top_k=5 · GB10 measured baseline — the lane the harness profile points at for an always-on Spark agent.
How to load
License: free · apache-2.0.
This is a local-first recall harness — it ships as code inside
fieldkit[arena] and builds its index on-box. There is
no downloadable Hub bundle; the corpus and queries never leave the
machine.
pip install "fieldkit[arena]"
fieldkit arena up # cockpit + recall layer on 127.0.0.1:7866 Known drift
Every measurement has a measurement window. These are the bounds the harness profile is honest about.
- Reranker absent on GB10
- the cosine-only score over top-5 retrieval is the floor, not the reranked ceiling; 1 reranker lane is unsupported on GB10 (NGC 410-gone, no -dgx-spark profile), so rerank=True hard-raises rather than mislabel a score (R22).
- Generator-side metrics not in this lane
- 3 of 3 generator-side scores (faithfulness / correctness / refusal-rate) are left null — they need the generator NIM; this is the retrieval-only recall measurement.
- Source-class population
- the multi-source provenance schema is live across 5 classes (article · lineage · eval · scout · deep_research) but only the article class is populated today — 313/313 chunks across 49 published articles; the other 4 ingest paths are wired but unpopulated.
- Gold-set size
- recall is measured over 44 qa-eval rows, not a large-N guarantee.
Companion field note
The harness profile pairs with the field note products/orionfold-cortex — read the article for the lane-bakeoff narrative, the unified-memory math, and the tool-call reliability gate that decided the recommended lane.
Read the field note