saul-7b-instruct-v1-gguf
Quantization of Equall/Saul-7B-Instruct-v1 .
What this model does
Equall's Saul-7B-Instruct-v1 is a Mistral-based legal chat model — strong on LegalBench-style classification — but its 13.5 GB checkpoint wants a workstation card. This release ships five GGUF variants (Q4_K_M at 4.1 GB and 29.4 tok/s up to F16) so it runs offline on consumer hardware, each carrying a four-axis Spark-measured card: wikitext-2 perplexity, sustained tok/s, thermal-envelope minutes, and a LegalBench score. Orionfold's contribution is the distribution + measurement layer; Equall did the legal fine-tune.
Use cases
- Offline legal-domain chat and clause/issue classification on consumer hardware
- Drafting and triage behind your own document-retrieval layer
- Picking a quant variant by workload shape, not just RAM budget
Audience — Local-LLM power users and legal-tech builders who want an offline legal chat model on a consumer GPU — for drafting and triage support, not legal advice.
Spec matrix
Ranks within each column drive the heatmap. Lower perplexity, higher throughput, higher vertical eval — the sweet-spot row balances all three.
| Variant | Perplexity ↓ | Spark tok/s ↑ | Vertical eval ↑ |
|---|---|---|---|
| Q4_K_M | 5.9864 | 29.43 | 0.62 |
| Q5_K_M Sweet spot | 5.9380 | 20.19 | 0.72 |
| Q6_K | 5.9250 | 22.39 | 0.68 |
| Q8_0 | 5.9138 | 7.30 | 0.66 |
| F16 | 5.9165 | 10.88 | 0.68 |
Methods
Read the field note Orionfold/Saul-7B-Instruct-v1-GGUF on Spark — five legal variants, LegalBench mini-eval, four-axis measurement card Five GGUF variants of Equall/Saul-7B-Instruct-v1 measured on a DGX Spark — Q5_K_M scores 72% on LegalBench (n=50, contains) at 20 tok/s and 4.8 GB. Each card carries perplexity, sustained tok/s, thermal envelope, and a 5-task LegalBench subset score. Open articleKnown drift
Disclosed limitations with explicit bounds — the scope is named, not implied.
- LegalBench scored with a lenient "contains" matcher
- The LegalBench mini-eval (n=50) scores by substring "contains" match, more forgiving than strict exact-match — read the 62–72% range as an upper bound on that rubric, not a strict-accuracy figure. Q5_K_M tops at 36/50.
- Q8_0 sustained-throughput anomaly
- Q8_0 generates at 7.3 tok/s — ~33% below F16's 10.9 and slower than every K-quant — the same continued-pretrain-shape Q8_0 slowdown seen on the finance card. Perplexity favors Q8_0 but Q6_K (22.4 tok/s) is the safer throughput pick.
- Not legal advice
- A 7B model inherited from the upstream Mistral base — for drafting, triage, and classification support, not legal advice or filing decisions. No jurisdiction-specific validation is claimed.