Tag

#gguf

Articles tagged "gguf" — 5 entries.

Article №43 fine-tuning Foundation 19 May 2026 ~1 hour (one container, six gates, two GGUFs)

Unsloth on the Spark — When the Train-Time Peak Equals the Base-Load Peak

Six gates clear in one container against the v1 reset: pip install --no-deps preserves the s40 stack, FastLanguageModel loads at 16.94 GB peak, a 100-step LoRA train holds the same envelope, save_pretrained_gguf() emits both quants in 207 seconds end-to-end.

Article №40 deployment llama.cpp 16 May 2026 ~5 hours end-to-end on a DGX Spark

Machine that Builds Machines

Orionfold/II-Medical-8B-GGUF on Spark — five medical-reasoning variants, MedMCQA mini-eval, ChatML reasoning format

Five GGUF variants of Intelligent-Internet/II-Medical-8B (Qwen3-8B + DAPO reasoning recipe) measured on a DGX Spark. Q5_K_M lands at 36.4 tok/s, 5.45 GB, and 52% on a MedMCQA n=50 mini-eval — above F16. First reasoning recipe in the series.

uses fieldkit.quantfieldkit.publishfieldkit.evalfieldkit.lineage

Article №39 deployment llama.cpp 15 May 2026 ~5 hours end-to-end on a DGX Spark

Machine that Builds Machines

Orionfold/SecurityLLM-GGUF on Spark — five cyber variants, CyberMetric mini-eval, MCQ letter scoring

Five GGUF variants of ZySec-AI/SecurityLLM measured on a DGX Spark — Q4_K_M scores 40% on CyberMetric MCQ at 47.7 tok/s and 4.1 GB; the smaller variants matched or beat F16's 34%. Third vertical card; zero fieldkit source changes.

uses fieldkit.quantfieldkit.publishfieldkit.evalfieldkit.lineage

Article №38 deployment llama.cpp 14 May 2026 ~5 hours end-to-end on a DGX Spark

Machine that Builds Machines

Orionfold/Saul-7B-Instruct-v1-GGUF on Spark — five legal variants, LegalBench mini-eval, four-axis measurement card

Five GGUF variants of Equall/Saul-7B-Instruct-v1 measured on a DGX Spark — Q5_K_M scores 72% on LegalBench (n=50, contains) at 20 tok/s and 4.8 GB. Each card carries perplexity, sustained tok/s, thermal envelope, and a 5-task LegalBench subset score.

uses fieldkit.quantfieldkit.publishfieldkit.evalfieldkit.lineage

Article №37 deployment llama.cpp 14 May 2026 ~6 hours end-to-end on a DGX Spark

Machine that Builds Machines

Orionfold/finance-chat-GGUF on Spark — five variants, FinanceBench mini-eval, four-axis measurement card

Five GGUF variants of AdaptLLM/finance-chat measured on a DGX Spark — Q8_0 perplexity-matches F16 losslessly, Q4_K_M ships at 31 tok/s. Each card carries perplexity, sustained tok/s, thermal envelope, and FinanceBench accuracy.

uses fieldkit.quantfieldkit.publishfieldkit.evalfieldkit.lineage