orionfold-cortex — Harnesses

Positioning

A raw pgvector table drifts silently and carries no trust card on its chunks — you cannot tell a Spark-measured fact from an external claim, and a re-index can quietly drop recall with no alarm. Orionfold Cortex wraps the index with stamped provenance, a coverage report, and a recall@k promotion gate, all dispatched and watched through the Arena control plane. The machine manages its own memory.

Re-index a multi-source corpus (article · lineage · eval · scout · deep_research) with provenance stamped per chunk, dispatched from the cockpit
Score chunk-recall@k + slug-recall@k against an in-repo gold set, gated like-for-like against the prior index so a rebuild can't silently regress recall
Query the Second Brain with a provenance/trust-tier filter — cited hits a hosted RAG can't honestly attribute

Audience. DGX Spark operators running a private, local-first RAG recall layer they drive, not a SaaS.

Measured

qa-eval.jsonl · 44 held-out Q · chunk-recall@5 / slug-recall@5 (cosine-only, GB10)

chunk-recall@5: 0.4091
slug-recall@5: 0.7273

Lane variants

The harness profile records every serving lane that was driven through the same Hermes agent on the same DGX Spark. Throughput is recorded as measured tokens-per-second on the box.

cosine-only · top_k=5 · GB10 measured baselinerecommended

1 lanes

Recommended lane. cosine-only · top_k=5 · GB10 measured baseline — the lane the harness profile points at for an always-on Spark agent.

How to load

License: free · apache-2.0. This is a local-first recall harness — it ships as code inside fieldkit[arena] and builds its index on-box. There is no downloadable Hub bundle; the corpus and queries never leave the machine.

pip install "fieldkit[arena]"

fieldkit arena up   # cockpit + recall layer on 127.0.0.1:7866

Known drift

Every measurement has a measurement window. These are the bounds the harness profile is honest about.

Reranker absent on GB10: the cosine-only score over top-5 retrieval is the floor, not the reranked ceiling; 1 reranker lane is unsupported on GB10 (NGC 410-gone, no -dgx-spark profile), so rerank=True hard-raises rather than mislabel a score (R22).
Generator-side metrics not in this lane: 3 of 3 generator-side scores (faithfulness / correctness / refusal-rate) are left null — they need the generator NIM; this is the retrieval-only recall measurement.
Source-class population: the multi-source provenance schema is live across 5 classes (article · lineage · eval · scout · deep_research) but only the article class is populated today — 313/313 chunks across 49 published articles; the other 4 ingest paths are wired but unpopulated.
Gold-set size: recall is measured over 44 qa-eval rows, not a large-N guarantee.

Companion field note

The harness profile pairs with the field note products/orionfold-cortex — read the article for the lane-bakeoff narrative, the unified-memory math, and the tool-call reliability gate that decided the recommended lane.

Read the field note