Name: saul-7b-instruct-v1-gguf
Published: 2026-05-14T15:49:07Z
License: open

What this model does

Equall's Saul-7B-Instruct-v1 is a Mistral-based legal chat model — strong on LegalBench-style classification — but its 13.5 GB checkpoint wants a workstation card. This release ships five GGUF variants (Q4_K_M at 4.1 GB and 29.4 tok/s up to F16) so it runs offline on consumer hardware, each carrying a four-axis Spark-measured card: wikitext-2 perplexity, sustained tok/s, thermal-envelope minutes, and a LegalBench score. Orionfold's contribution is the distribution + measurement layer; Equall did the legal fine-tune.

Use cases

Offline legal-domain chat and clause/issue classification on consumer hardware
Drafting and triage behind your own document-retrieval layer
Picking a quant variant by workload shape, not just RAM budget

Audience — Local-LLM power users and legal-tech builders who want an offline legal chat model on a consumer GPU — for drafting and triage support, not legal advice.

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Vertical bench: LegalBench (n=50, contains)
Variant	Perplexity ↓	Spark tok/s ↑	Vertical eval ↑
Q4_K_M	5.9864	29.43	0.62
Q5_K_M Sweet spot	5.9380	20.19	0.72
Q6_K	5.9250	22.39	0.68
Q8_0	5.9138	7.30	0.66
F16	5.9165	10.88	0.68

Methods

Read the field note Orionfold/Saul-7B-Instruct-v1-GGUF on Spark — five legal variants, LegalBench mini-eval, four-axis measurement card Five GGUF variants of Equall/Saul-7B-Instruct-v1 measured on a DGX Spark — Q5_K_M scores 72% on LegalBench (n=50, contains) at 20 tok/s and 4.8 GB. Each card carries perplexity, sustained tok/s, thermal envelope, and a 5-task LegalBench subset score. Open article

Known drift

Disclosed limitations with explicit bounds — the scope is named, not implied.

LegalBench scored with a lenient "contains" matcher: The LegalBench mini-eval (n=50) scores by substring "contains" match, more forgiving than strict exact-match — read the 62–72% range as an upper bound on that rubric, not a strict-accuracy figure. Q5_K_M tops at 36/50.
Q8_0 sustained-throughput anomaly: Q8_0 generates at 7.3 tok/s — ~33% below F16's 10.9 and slower than every K-quant — the same continued-pretrain-shape Q8_0 slowdown seen on the finance card. Perplexity favors Q8_0 but Q6_K (22.4 tok/s) is the safer throughput pick.
Not legal advice: A 7B model inherited from the upstream Mistral base — for drafting, triage, and classification support, not legal advice or filing decisions. No jurisdiction-specific validation is claimed.