Series

Second Brain

Query-time RAG over your own corpus. Cheap to ingest, paid per query — the standard answer to "what do I do with my files and a 128 GB GPU?"

Article №13 deployment TensorRT-LLM + Triton Inference Server ~4 hours including two container pulls and three engine builds
Second Brain

TensorRT-LLM on the Spark — FP8 Isn't the Reason to Drop NIM. NVFP4 Is.

Dropping below NIM to raw TensorRT-LLM on a GB10 Spark. FP8 beats NIM's vLLM by 10-15% — barely worth the rebuild. NVFP4 beats it by 76% on decode, 43% on TTFT, and ships a 34%-smaller engine. The reason to drop NIM is the Blackwell-native 4-bit kernel, not FP8.

Article №14 fine-tuning Hugging Face PEFT + Qwen2.5-3B-Instruct ~45 minutes end-to-end — 5 min corpus via NIM 8B, 69 s training, 3 min benchmark, plus a 6 GB base-model download
Second Brain

LoRA on Your Own Q&A — What 231 Pairs Actually Teach a 3B Model

231 own-voice Q&A pairs, a rank-16 LoRA, 69 s of training on a GB10 Spark. The adapter won't memorize your exact numbers, but it will take a model that refuses 61% of questions about your work and turn it into one that answers all of them in your voice. For facts you still need RAG.

uses fieldkit.eval

Article №15 observability NeMo Evaluator ~60 minutes end-to-end — 40 s to ingest the blog into pgvector, 2 min for retrieval, 4 min for generation across three 8B variants, 90 s for the LoRA variant, 9 min for grading
Second Brain

Ragas, Reranked — What 44 Held-Out Questions Say About the Second Brain Stack

A Ragas-style harness written in 200 lines of stdlib Python, run locally on the DGX Spark, against four variants of the Second Brain RAG chain. Naive RAG scores 3.30 / 5. Rerank RAG scores 4.27. LoRA+RAG is a surprise — it does not beat naive. Retrieval is where the points come from.

uses fieldkit.eval

Article №17 agentic NIM ~90 minutes — 30 min to design the tool surface, 30 min to wire FastMCP + pgvector, 15 min to register with Claude Code, 15 min for the demo and trace
Second Brain

Second Brain as a Tool — Wrapping the RAG Stack in MCP for Claude Code

Closing the Second Brain arc. Four MCP tools wrap the RAG chain — embed, retrieve, optionally rerank, generate — and any Claude Code session anywhere on the box becomes a grounded research client. 200 lines of Python, one launcher, one .mcp.json entry.

Upcoming agentic NIM ~30 min read
Second Brain

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction — Spark reproduction notes

Reproducing DCI-Agent-Lite on a DGX Spark — NIM-served 8B agent + ripgrep + filesystem corpus, no embedder or vector DB; extracts the operator vocabulary as `fieldkit.rag.operators` and quantifies how much of the existing pgvector + reranker stack DCI lets you delete.