fieldkit

Verified-on-Spark Python patterns, lifted from the AI Native Field Notes into one importable package. Every module is the tested distillate of the articles it appears under.

v0.4.2 Apache-2.0 Python 3.11+

Install the package

Terminal
$ pip install fieldkit
KV-cache math NIM client Naive-RAG Eval harness
The Problem

Ship AI features faster, cheaper, with less glue.

Every AI build pays the glue tax — days lost to retry logic, context-window math, pgvector schemas, and eval rubrics; token bills inflated by overflow 400s and missing backoff; brittle copy-paste lifted from a half-dozen articles per project. The patterns are right; assembling them by hand is slow and expensive.

23
articles already distill into fieldkit
/field-notes/
9
modules, one import each
fieldkit.{capabilities, nim, rag, eval, training, lineage, quant, publish, cli}
8192
token preflight catches NIM 400s
before the network call

Each pattern was first verified inside an article — KV-cache arithmetic, the OpenAI-compatible NIM client with its 8192-token preflight, the strict-context RAG pipeline from naive-rag-on-spark, the eval harness behind every evidence file. fieldkit is where those patterns live after they're tested.

Context overflow

NIM endpoints return 400 on quiet overflow — fieldkit catches it on the client

Retry gaps

Exponential backoff (0.5 s → 8 s) and cold-start polling are baked in, not bolted on

Schema drift

pgvector tables, indexes, and dimensions stay in sync — one ensure_schema() call

Eval blindness

Bench, Judge, refusal detection, and trajectory analysis ship as one harness

fieldkit is the tested distillate.

The Solution

fieldkit in nine imports.

Each module is the public surface of a working article. Read the API reference, drop the import in, ship.

fieldkit.capabilities

Memory and feasibility math

Typed Python facade over the project's Spark capabilities map. Canonical KV-cache and weight arithmetic.

Read the API
fieldkit.nim

OpenAI-compatible inference client

OpenAI-compatible NIM client with retries, context-overflow preflight, and a chunker that respects the 8192-token ceiling.

Read the API
fieldkit.rag most-used

Ingest → retrieve → rerank → fuse

Composable ingest → retrieve → rerank → fuse RAG pipeline backed by pgvector + a NIM embedder + the strict-context grounded prompt from `naive-rag-on-spark`.

Read the API
fieldkit.eval

Bench, judge, assertion, pass@k

Bench, Judge, Trajectory, the project's refusal detector — plus the v0.2 verifier-loop additions (AssertionGrader, PassAtK, AgentRun, MatchedBaseComparison) for agent + RL benchmarks.

Read the API
fieldkit.training

LoRA reference + weight-delta tracker

Fine-tuning primitives for any RL or SFT loop on the Spark — a CPU-resident LoRA reference snapshot that sidesteps peft 0.19's offloader bug, and a pre/post weight-delta tracker for sanity-checking that gradients actually moved.

Read the API
fieldkit.lineage

Append-only trial log + prompt rendering

Append-only trial log + deterministic prompt rendering — the portable part of cxcscmu's Auto-Research-Recipes harness. A 17-column TSV per trial, a 10-class status enum, and the Markdown lineage block the next specialist reads at session entry.

Read the API
fieldkit.quant

GGUF quantize + four-axis measure

GGUF quantize + measure pipeline — wraps llama.cpp's `convert_hf_to_gguf.py` + `llama-quantize` + `llama-perplexity` + `llama-bench`, plus a pure-stdlib `nvidia-smi` thermal probe. Emits the `QuantReport` shape `fieldkit.publish.publish_quant` consumes. Non-GGUF formats (AWQ / GPTQ / EXL3 / MLX / NVFP4) are named stubs reserving the v0.5 API surface.

Read the API
fieldkit.publish

HuggingFace card + manifest + push

HuggingFace push surface — `ModelCard` (frontmatter + body renderer), `ArtifactManifest` (Phase-2 sync record), `HFHubAdapter` (lazy huggingface_hub wrapper, dry-run by default), `publish_quant` orchestrator. Every Orionfold artifact card carries the same Spark-tested measurement quad (perplexity, tok/s, thermal envelope, optional vertical-eval) — this module is what makes that shape deterministic.

Read the API
fieldkit.cli

Smoke checks without writing Python

A thin Typer wrapper over the modules. Quick checks and smoke benchmarks without writing Python.

Read the API
verified-on-Spark tested distillate Apache-2.0 Python 3.11+ pgvector + NIM
quickstart.py
from fieldkit.capabilities import kv_cache_bytes, weight_bytes
from fieldkit.nim import NIMClient
from fieldkit.rag import Document, Pipeline
from fieldkit.eval import Bench, Judge, is_refusal

# 70B Llama 3.1 KV cache at 32-user × 16K ctx, FP16:
kv_cache_bytes(hidden=8 * 128, n_layers=80, ctx=16384, batch=32, dtype="fp16")
# → 171_798_691_840  (≈ 171.8 GB)

# Naive RAG end-to-end:
with NIMClient(base_url="http://localhost:8000/v1",
               model="meta/llama-3.1-8b-instruct") as gen, \
     Pipeline(embed_url="http://localhost:8001/v1",
              pgvector_dsn="postgresql://spark:spark@localhost:5432/vectors",
              generator=gen) as pipe:
    pipe.ensure_schema()
    pipe.ingest([Document(id=1, text="...", label="spark")])
    print(pipe.ask("How much memory does the Spark have?")["answer"])
Quickstart

Four imports. One pipeline.

These four imports replace ~250 lines of glue from across the field notes — embed setup, retry policy, preflight checks, schema bootstrap, and strict-context prompting. Drop them into a fresh Python file and you have a working RAG.

  • Retries baked in

    NIMClient handles cold-starts, exponential backoff (0.5 s → 8 s), and connect timeouts so your pipeline doesn't fail under co-resident memory pressure.

  • Preflight context check

    8192-token preflight runs before every request — context overflow surfaces as a Python exception, not a NIM 400.

  • Schema you can trust

    Pipeline.ensure_schema() creates pgvector tables, indexes, and the right embedding dimension. Run it once and forget it.

  • Strict-context prompting

    The RAG prompt is verbatim from naive-rag-on-spark. Refusals are detected; trajectories are inspectable.

Without Python

Quick wins from the shell.

Sanity-check inference math, smoke-test a pipeline, or run a benchmark without writing a line of Python. fieldkit is on $PATH after install.

Terminal zsh
$ fieldkit version
0.4.2

$ fieldkit envelope "70B params fp8"
~70 GB weights; leaves ~50 GB for KV + activations + system; tight but possible

$ fieldkit feasibility llama-3.1-70b --ctx 4096 --batch 32 --dtype fp8
weights (fp8):       70.0 GB
KV cache (fp8):      21.5 GB  (ctx=4096, batch=32)
weights + KV:        91.5 GB

$ fieldkit bench rag --table fieldkit_cli_bench_rag --out /tmp/bench.json
Full CLI reference

fieldkit · v0.4.2

Build with verified patterns.

Read the API reference, browse the field notes, or grab the source. Apache-2.0; Python 3.11+.

Install the package

Terminal
$ pip install fieldkit