← Quantizations
Quant · GGUF · 5 variants

saul-7b-instruct-v1-gguf

Quantization of Equall/Saul-7B-Instruct-v1 .

HF Orionfold/Saul-7B-Instruct-v1-GGUF License free mit Published

What this model does

Equall's Saul-7B-Instruct-v1 is a Mistral-based legal chat model — strong on LegalBench-style classification — but its 13.5 GB checkpoint wants a workstation card. This release ships five GGUF variants (Q4_K_M at 4.1 GB and 29.4 tok/s up to F16) so it runs offline on consumer hardware, each carrying a four-axis Spark-measured card: wikitext-2 perplexity, sustained tok/s, thermal-envelope minutes, and a LegalBench score. Orionfold's contribution is the distribution + measurement layer; Equall did the legal fine-tune.

Use cases

  • Offline legal-domain chat and clause/issue classification on consumer hardware
  • Drafting and triage behind your own document-retrieval layer
  • Picking a quant variant by workload shape, not just RAM budget

Audience — Local-LLM power users and legal-tech builders who want an offline legal chat model on a consumer GPU — for drafting and triage support, not legal advice.

Spec matrix

Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.

Vertical bench: LegalBench (n=50, contains)
Variant Perplexity Spark tok/s Vertical eval
Q4_K_M 5.9864 29.43 0.62
Q5_K_M Sweet spot 5.9380 20.19 0.72
Q6_K 5.9250 22.39 0.68
Q8_0 5.9138 7.30 0.66
F16 5.9165 10.88 0.68

Methods

Read the field note Orionfold/Saul-7B-Instruct-v1-GGUF on Spark — five legal variants, LegalBench mini-eval, four-axis measurement card Five GGUF variants of Equall/Saul-7B-Instruct-v1 measured on a DGX Spark — Q5_K_M scores 72% on LegalBench (n=50, contains) at 20 tok/s and 4.8 GB. Each card carries perplexity, sustained tok/s, thermal envelope, and a 5-task LegalBench subset score. Open article

Known drift

Disclosed limitations with explicit bounds — the scope is named, not implied.

LegalBench scored with a lenient "contains" matcher
The LegalBench mini-eval (n=50) scores by substring "contains" match, more forgiving than strict exact-match — read the 62–72% range as an upper bound on that rubric, not a strict-accuracy figure. Q5_K_M tops at 36/50.
Q8_0 sustained-throughput anomaly
Q8_0 generates at 7.3 tok/s — ~33% below F16's 10.9 and slower than every K-quant — the same continued-pretrain-shape Q8_0 slowdown seen on the finance card. Perplexity favors Q8_0 but Q6_K (22.4 tok/s) is the safer throughput pick.
Not legal advice
A 7B model inherited from the upstream Mistral base — for drafting, triage, and classification support, not legal advice or filing decisions. No jurisdiction-specific validation is claimed.