Tag

#observability

Articles tagged "observability" — 4 entries.

Article №54 observability Foundation 03 Jun 2026 ~4 hours end-to-end — bring up the cockpit, drive a reindex + two RAG-evals through the control plane, score 44 questions, and ship the artifact

Second Brain

The Machine Manages Its Own Memory — and the Bug the Mocks Slept Through

Driving the Arena recall layer end-to-end on its own corpus: reindex → score → gate, dispatched through the control plane, recall@5 measured against 44 held-out questions. The first real drain caught a bug eight mock-injected unit tests had slept through — the case for operating the thing you built.

uses fieldkit.memoryfieldkit.arenafieldkit.harnessfieldkit.eval

Article №26 observability NIM Llama 3.1 8B 01 May 2026 ~2 hours wall — analysis runs in seconds, the rest is reading + writing

Machine that Builds Machines

Was the Agent Researching, or Flailing? An Observability Pass on the Trajectory

A8 said the LoRA mode-collapsed because the trajectory was thin. This puts numbers on it: 6 of 13 knobs ever touched, 72% of proposals repeated a prior pair, and the proposer's k=5 history window is the structural cause.

Article №15 observability NeMo Evaluator 23 Apr 2026 ~60 minutes end-to-end — 40 s to ingest the blog into pgvector, 2 min for retrieval, 4 min for generation across three 8B variants, 90 s for the LoRA variant, 9 min for grading

Second Brain

Ragas, Reranked — What 44 Held-Out Questions Say About the Second Brain Stack

A Ragas-style harness written in 200 lines of stdlib Python, run locally on the DGX Spark, against four variants of the Second Brain RAG chain. Naive RAG scores 3.30 / 5. Rerank RAG scores 4.27. LoRA+RAG is a surprise — it does not beat naive. Retrieval is where the points come from.

uses fieldkit.eval

Upcoming observability NVIDIA DCGM + Prometheus + Grafana planned ~3 hours, mostly dashboard tuning

Watching the GPU — DCGM, Prometheus, and a Local Grafana for the Spark

A planned setup of DCGM Exporter → Prometheus → Grafana entirely on the Spark itself. The goal is a single dashboard that tells the truth about GPU memory, SM occupancy, and per-container utilization for a rig that's running NIMs, pgvector, and an occasional training job at the same time.