finance-chat-notebooks
Build the finance-chat quant — and call the model — on a Spark or a free cloud GPU
What this notebook does
The artifact → card → article loop sells the outcome but offers no runnable on-ramp: a researcher who wants to reproduce the five-variant quant, or a developer who wants to call the model, has to reconstruct the journey from prose. These two notebooks close that gap. The builder notebook walks the feasibility → quantize → measure → publish journey as typed fieldkit API calls; the user notebook calls finance-chat on real financial-Q&A tasks. Both are one-click via Open in Colab / Open in Kaggle and run offline on a DGX Spark — no financials leave the box.
Use cases
- Builder: reproduce the release — feasibility envelope, quantize sweep, the Spark-tested quad + variants table, publish — as fieldkit calls
- User: open-book Q&A over a filing, finance-concept explanation, and FP&A variance commentary
- User: ground answers in 10-K text with fieldkit.rag and gate numeric answers with fieldkit.eval.numeric_match
- Both: run offline on a DGX Spark or on a free Colab / Kaggle GPU (dual-path, runtime-detected)
Audience — AI researchers and engineers who want to reproduce the quant, and FP&A teams, finance-app developers, and analysts who want a private offline finance assistant — on Spark-class hardware (GB10, 128 GB unified memory) or a free cloud GPU.
Choosing the variant
Two facets of the same notebook — pick by your goal.
- builder
- Walks the build journey on Spark — fieldkit API calls replacing ad-hoc scripts; surfaces speed, feasibility, and viability.
- user
- Demonstrates the published model on realistic domain tasks — runtime-detected, runs on Spark or on a free Colab/Kaggle GPU.
Methods
Read the field note Orionfold/finance-chat-GGUF on Spark — five variants, FinanceBench mini-eval, four-axis measurement card Five GGUF variants of AdaptLLM/finance-chat measured on a DGX Spark — Q8_0 perplexity-matches F16 losslessly, Q4_K_M ships at 31 tok/s. Each card carries perplexity, sustained tok/s, thermal envelope, and FinanceBench accuracy. Open articleKnown drift
Bounded limitations — Colab/Kaggle runs use the published quant; reasoning quality may differ from the BF16 weights on Spark. Each entry carries an explicit bound.
- Cloud (Colab / Kaggle) path serves the Q4_K_M quant; the Spark path serves Q5_K_M
- One quant level apart — Q4_K_M scores 14% on the FinanceBench n=50 mini-eval vs Q5_K_M's 16% (2 points, inside the n=50 noise floor); both run the identical code path. See the sibling GGUF card.
- The builder notebook's quantize + publish steps render the recorded Spark run, not a live re-execution
- 2 recorded Spark-only cells (the quantize sweep and the publish dry-run); the remaining cells — feasibility envelope, the spark_quad panel, and the variants table — run live on any runtime from the manifest.
- The user notebook's live model-chat cells are not captured in the published marketing snapshot
- 4 use-case cells call the model live on any runtime; the snapshot captures the deterministic charts + banners and describes the chat output rather than screenshotting it.
Sibling artifacts
The model this notebook targets, plus other variants in the same family.