fieldkit

Verified-on-Spark Python patterns, lifted from the AI Native Field Notes into one importable package. Every module is the tested distillate of the articles it appears under.

v0.2.0.post1 Apache-2.0 Python 3.11+

Install the package

Terminal

$ pip install fieldkit▌

KV-cache math NIM client Naive-RAG Eval harness

The Problem

Ship AI features faster, cheaper, with less glue.

Every AI build pays the glue tax — days lost to retry logic, context-window math, pgvector schemas, and eval rubrics; token bills inflated by overflow 400s and missing backoff; brittle copy-paste lifted from a half-dozen articles per project. The patterns are right; assembling them by hand is slow and expensive.

articles already distill into fieldkit

/field-notes/

modules, one import each

fieldkit.{capabilities,nim,rag,eval,cli}

8192

token preflight catches NIM 400s

before the network call

Each pattern was first verified inside an article — KV-cache arithmetic, the OpenAI-compatible NIM client with its 8192-token preflight, the strict-context RAG pipeline from naive-rag-on-spark, the eval harness behind every evidence file. fieldkit is where those patterns live after they're tested.

Context overflow

NIM endpoints return 400 on quiet overflow — fieldkit catches it on the client

Retry gaps

Exponential backoff (0.5 s → 8 s) and cold-start polling are baked in, not bolted on

Schema drift

pgvector tables, indexes, and dimensions stay in sync — one ensure_schema() call

Eval blindness

Bench, Judge, refusal detection, and trajectory analysis ship as one harness

fieldkit is the tested distillate.

The Solution

`fieldkit` in five imports.

Each module is the public surface of a working article. Read the API reference, drop the import in, ship.

fieldkit.capabilities

Memory and feasibility math

Typed Python facade over the project's Spark capabilities map. Canonical KV-cache and weight arithmetic.

Read the API

fieldkit.nim

OpenAI-compatible inference client

OpenAI-compatible NIM client with retries, context-overflow preflight, and a chunker that respects the 8192-token ceiling.

Read the API

fieldkit.rag most-used

Ingest → retrieve → rerank → fuse

Composable ingest → retrieve → rerank → fuse RAG pipeline backed by pgvector + a NIM embedder + the strict-context grounded prompt from `naive-rag-on-spark`.

Read the API

fieldkit.eval

Bench, judge, refusal, trajectory

Bench, Judge, Trajectory, the project's refusal detector — plus the v0.2 verifier-loop additions (AssertionGrader, PassAtK, AgentRun, MatchedBaseComparison) for agent + RL benchmarks.

Read the API

fieldkit.training

Fine-tuning primitives for any RL or SFT loop on the Spark — a CPU-resident LoRA reference snapshot that sidesteps peft 0.19's offloader bug, and a pre/post weight-delta tracker for sanity-checking that gradients actually moved.

Read the API

fieldkit.cli

Smoke checks without writing Python

A thin Typer wrapper over the modules. Quick checks and smoke benchmarks without writing Python.

Read the API

verified-on-Spark tested distillate Apache-2.0 Python 3.11+ pgvector + NIM

quickstart.py

from fieldkit.capabilities import kv_cache_bytes, weight_bytes
from fieldkit.nim import NIMClient
from fieldkit.rag import Document, Pipeline
from fieldkit.eval import Bench, Judge, is_refusal

# 70B Llama 3.1 KV cache at 32-user × 16K ctx, FP16:
kv_cache_bytes(hidden=8 * 128, n_layers=80, ctx=16384, batch=32, dtype="fp16")
# → 171_798_691_840  (≈ 171.8 GB)

# Naive RAG end-to-end:
with NIMClient(base_url="http://localhost:8000/v1",
               model="meta/llama-3.1-8b-instruct") as gen, \
     Pipeline(embed_url="http://localhost:8001/v1",
              pgvector_dsn="postgresql://spark:spark@localhost:5432/vectors",
              generator=gen) as pipe:
    pipe.ensure_schema()
    pipe.ingest([Document(id=1, text="...", label="spark")])
    print(pipe.ask("How much memory does the Spark have?")["answer"])

Quickstart

Four imports. One pipeline.

These four imports replace ~250 lines of glue from across the field notes — embed setup, retry policy, preflight checks, schema bootstrap, and strict-context prompting. Drop them into a fresh Python file and you have a working RAG.

Retries baked in

NIMClient handles cold-starts, exponential backoff (0.5 s → 8 s), and connect timeouts so your pipeline doesn't fail under co-resident memory pressure.
Preflight context check

8192-token preflight runs before every request — context overflow surfaces as a Python exception, not a NIM 400.
Schema you can trust

Pipeline.ensure_schema() creates pgvector tables, indexes, and the right embedding dimension. Run it once and forget it.
Strict-context prompting

The RAG prompt is verbatim from naive-rag-on-spark. Refusals are detected; trajectories are inspectable.

Verified-in

Every module ships from a working article.

These are the field notes that exercise one or more fieldkit modules end-to-end on the Spark. The article runs the math, ships the evidence, and the tested abstraction lives on as an importable class.

Browse field notes +10 more

Without Python

Quick wins from the shell.

Sanity-check inference math, smoke-test a pipeline, or run a benchmark without writing a line of Python. fieldkit is on $PATH after install.

Terminal zsh

$ fieldkit version
0.1.0.dev0

$ fieldkit envelope "70B params fp8"
~70 GB weights; leaves ~50 GB for KV + activations + system; tight but possible

$ fieldkit feasibility llama-3.1-70b --ctx 4096 --batch 32 --dtype fp8
weights (fp8):       70.0 GB
KV cache (fp8):      21.5 GB  (ctx=4096, batch=32)
weights + KV:        91.5 GB

$ fieldkit bench rag --table fieldkit_cli_bench_rag --out /tmp/bench.json

Full CLI reference

fieldkit · v0.2.0.post1

Build with verified patterns.

Read the API reference, browse the field notes, or grab the source. Apache-2.0; Python 3.11+.

GitHub API Reference Field Notes

Install the package

Terminal

$ pip install fieldkit▌

fieldkit

Ship AI features faster, cheaper, with less glue.

fieldkit in five imports.

Four imports. One pipeline.

Every module ships from a working article.

Quick wins from the shell.

Build with verified patterns.

`fieldkit` in five imports.