What it is
The publishing side of the Orionfold production line. fieldkit.quant produces a QuantReport; fieldkit.publish turns it into a HuggingFace repo with a deterministic model card and a per-artifact YAML manifest the source repo and destination site both read.
Three surfaces. ModelCard renders the canonical card shape — frontmatter (license, library_name, base_model, tags, model_creator), a ## Spark-tested block (perplexity + tok/s + thermal envelope + optional vertical-eval table), a ## Variants table, an auto-generated ## How to run body (huggingface-cli download + llama-server + llama-cpp-python snippets templated from the HF repo path), an optional ## Lineage block (rendered from a fieldkit.lineage.LineageStore if provided), a ## Methods backlink to the anchor article, and an Orionfold LLC footer. ArtifactManifest is the frozen dataclass for src/content/artifacts/<slug>.yaml — the Phase-2 sync record per project_artifact_manifests_phase2; the destination renders catalog pages from getCollection('artifacts'). HFHubAdapter is a lazy wrapper around huggingface_hub — defaults to dry_run=True (stages files + logs the would-be calls; no network, no token); flip dry_run=False to push via HfApi().upload_folder(...).
The module exists because manual card authoring at MTBM’s 3–5-day cadence is the bottleneck. Every quant needs a tags list, a perplexity table, a tok/s number, a thermal envelope note, a lineage backlink — and getting any of those wrong on the customer-facing HF page is a trust hit. fieldkit.publish makes the card the deterministic output of the quant+lineage run, not a hand-edit, so the only knobs the operator sets are the ones that genuinely require human judgement (the upstream license, the chat format, the featured variant).
Public API
from fieldkit.publish import (
ARTIFACT_KINDS, ArtifactKind, ArtifactManifest,
HFHubAdapter, HFHubNotAvailable, HFAuthError,
ModelCard, PublishError, PublishResult,
publish_quant, write_artifact_manifest,
ORIONFOLD_BRAND, ORIONFOLD_HF_HANDLE, ORIONFOLD_HF_ORG,
)
ORIONFOLD_BRAND + ORIONFOLD_HF_HANDLE
ORIONFOLD_BRAND = "Orionfold LLC"
ORIONFOLD_HF_HANDLE = "Orionfold"
The brand stamped on every card footer, and the HuggingFace user handle every repo lands under (Orionfold/<model>-GGUF, Bartowski-shape). ORIONFOLD_HF_ORG is a back-compat alias for ORIONFOLD_HF_HANDLE — kept callable for out-of-tree imports, slated for removal in a future cut.
ARTIFACT_KINDS
ARTIFACT_KINDS = (
"quant", "lora", "adapter", "embed",
"reranker", "dataset", "space", "bench",
)
The manifest kind enum. Mirrored by src/content.config.ts’s ARTIFACT_KINDS so Astro Zod validation and the Python writer stay in lockstep.
ModelCard(...)
Frozen dataclass + render() → str. Constructed by publish_quant from a QuantReport-shaped object plus the resolved license / chat_format / recommended_variant triple. Renders to a single README.md-style string.
Key fields:
ModelCard(
title="finance chat GGUF",
one_liner="...",
base_model="AdaptLLM/finance-chat",
license="llama2", # ← HF frontmatter scalar; reflects upstream model's license
library_name="gguf",
pipeline_tag="text-generation",
tags=("gguf", "spark-tested", "orionfold", "base_model:AdaptLLM/finance-chat"),
quant_format="gguf",
variants=({"name": "Q4_K_M", "size": "3.8 GB", "recommended": "..."}, ...),
perplexity={"Q4_K_M": 6.221, "Q8_0": 6.137, ...},
tokens_per_sec={"Q4_K_M": 31.1, "Q8_0": 8.9, ...},
sustained_load_minutes=2.18,
vertical_eval={"Q4_K_M": 0.14, ...}, # optional 5th column
vertical_eval_name="FinanceBench (n=50, numeric_match)",
hf_repo="Orionfold/finance-chat-GGUF", # drives default `## How to run` body
chat_format="llama-2", # → llama_cpp.Llama(chat_format=...)
recommended_variant="Q5_K_M", # featured in default snippets
llama_cpp_example_prompt="Explain working capital.", # user-message in the default `llama-cpp-python` snippet; falls back to a neutral placeholder when omitted
ollama_pull_handle=None, # opt-in override; default body wins otherwise
transformers_snippet=None,
lineage_prompt=None, # injected by publish_quant if a LineageStore is supplied
article_slug="becoming-a-gguf-publisher-on-spark",
article_title="...",
model_creator=ORIONFOLD_BRAND,
)
render() emits sections in canonical order: YAML frontmatter → title + elevator → ## Spark-tested (omitted if no measurements) → ## Variants → ## How to run (auto-rendered defaults when no explicit handle/snippet given; entirely omitted if no defaults templatable) → ## Lineage (if lineage_prompt supplied) → ## Methods link → footer.
ArtifactManifest(...)
Frozen dataclass for src/content/artifacts/<slug>.yaml. Flat-by-design (primitive types + dicts of primitives) so the YAML emitter is hand-rolled stdlib.
m = ArtifactManifest(
slug="finance-chat-gguf",
kind="quant",
artifact_class="gguf", # serialized as `class:` in YAML
base_model="AdaptLLM/finance-chat",
hf_repo="Orionfold/finance-chat-GGUF",
variants=("Q4_K_M", "Q5_K_M", "Q6_K", "Q8_0", "F16"),
perplexity={"Q4_K_M": 6.221, ...},
spark_tokens_per_sec={"Q4_K_M": 31.09, ...},
sustained_load_minutes=2.18,
vertical_eval={"Q4_K_M": 0.14, ...},
vertical_eval_name="FinanceBench (n=50, numeric_match)",
recommended_variant="Q5_K_M", # article-narrative pick; destination pins the "Sweet spot" badge to this variant
lineage_run_id=None,
license_tier="free", # Orionfold commercial tier (free / pro)
license_commercial_tier=None,
model_license="llama2", # upstream model license (HF frontmatter shape)
article="articles/becoming-a-gguf-publisher-on-spark/",
civitai_id=None,
download_count=None,
published_at="2026-05-14T04:46:11Z",
)
print(m.to_yaml())
The license_tier / license_commercial_tier fields live alongside model_license under a nested license: block in YAML output. Mac destination’s Zod schema mirrors this shape.
recommended_variant (v0.4.2+) lets the article’s narrative pick — the variant the writeup recommends — override the destination’s rank-avg picker. Cyber’s Q4_K_M topped CyberMetric but its worst-in-class perplexity dragged its rank-avg down, so without this field the catalog page would pin Q5_K_M as the “Sweet spot” instead. Same value flows into both ModelCard (HF README’s How-to-run snippets) and ArtifactManifest (destination catalog) so the badge and the snippet stay in sync.
write_artifact_manifest(manifest, *, artifacts_dir)
Writes the manifest to <artifacts_dir>/<slug>.yaml. Creates the directory if missing. Returns the absolute path of the written file — callers can stage it alongside the article for the next git commit.
HFHubAdapter(staging_dir, *, dry_run=True, token=None, org=ORIONFOLD_HF_HANDLE)
Thin wrapper around huggingface_hub. Dry-run by default: lays out the upload set on disk under staging_dir, logs the would-be calls. No HF imports required, no token required. Flip dry_run=False to push; the lazy import of huggingface_hub fires only then.
adapter = HFHubAdapter(staging_dir="/tmp/orionfold-stage/finance-chat", dry_run=True)
adapter.stage_text(card.render(), "README.md") # stages from a string
adapter.stage_file(gguf_path, "model-Q4_K_M.gguf") # stages by copying a file
result = adapter.push_folder(repo_name="finance-chat-GGUF")
result.dry_run # True
result.files_uploaded # ('README.md', 'model-Q4_K_M.gguf', ...)
result.logged_calls # the upload_folder kwargs that would have fired
Token resolution order: explicit token= arg → HF_TOKEN env → HUGGING_FACE_HUB_TOKEN env → huggingface_hub’s cached login. If all four are absent and dry_run=False, HFAuthError raises before the network call.
publish_quant(*, quant_report, base_model, repo_name, staging_dir, ...) → PublishResult
The one-line orchestrator. Reads the duck-typed quant_report fields (.format, .variants, .perplexity, .tokens_per_sec, .sustained_load_minutes, .variant_files, .vertical_eval, .vertical_eval_name, .model_license, .chat_format, .recommended_variant, .llama_cpp_example_prompt), builds a ModelCard, stages the README + variant files, writes the ArtifactManifest (if artifacts_dir supplied), and invokes HFHubAdapter.push_folder(). Explicit kwargs override duck-typed report attrs.
result = publish_quant(
quant_report=report,
base_model="AdaptLLM/finance-chat",
repo_name="finance-chat-GGUF",
staging_dir="/tmp/orionfold-stage/finance-chat",
artifacts_dir="/home/nvidia/ai-field-notes/src/content/artifacts",
article_slug="becoming-a-gguf-publisher-on-spark",
article_title="...",
vertical_eval={"Q4_K_M": 0.14, "Q5_K_M": 0.16, ...},
vertical_eval_name="FinanceBench (n=50, numeric_match)",
model_license="llama2", # critical — never default silently to apache-2.0
chat_format="llama-2",
recommended_variant="Q5_K_M",
llama_cpp_example_prompt="Explain working capital.", # mirror the article's example user-message
lineage_store=store, # optional; injects ## Lineage block
dry_run=True, # flip to False for the actual push
)
result.hf_repo # 'Orionfold/finance-chat-GGUF'
result.card_path # Path('/tmp/orionfold-stage/.../README.md')
result.manifest_path # Path('.../src/content/artifacts/finance-chat-gguf.yaml')
result.hf_url # None in dry-run; set after live push
The model_license / chat_format / recommended_variant kwargs landed in v0.4.x after the Orionfold/finance-chat-GGUF dry-run surfaced two card-rendering bugs: a hardcoded license: apache-2.0 (wrong for the Llama-2 lineage AdaptLLM base) and an empty ## How to run section (when no ollama handle or transformers snippet was supplied, the section header rendered with no body). Both are now caller-controlled with sane defaults.
Why this surface
Three things to notice. First, HFHubAdapter defaults to dry-run because the right workflow is dry-run → human review → live push. Library users who want a one-shot live push pass dry_run=False explicitly; library users who want the staging artifact for review (the common case during development) get it for free. The hf-publisher skill (/home/nvidia/.claude/skills/hf-publisher/) wraps this workflow as a triggered Claude Code surface.
Second, publish_quant duck-types its report rather than importing fieldkit.quant.QuantReport directly. This avoids a circular import (quant doesn’t depend on publish; publish doesn’t depend on quant) and lets non-quant callers — a LoRA pipeline, an embedding pipeline — supply their own report-shaped objects without subclassing.
Third, ArtifactManifest is structurally distinct from ModelCard even though they overlap. The card is for HuggingFace; the manifest is for the destination Astro catalog. Both encode the same artifact, but the consumers are different and have different schemas. Keeping them separate dataclasses lets each evolve independently — and lets write_artifact_manifest write the manifest even when the HF push is dry-run, which is what the source repo commits look like during article-only iterations.
Samples
scripts/g3_build_first_quant.sh—publish-dryrunstep assembles aQuantReport-shapedSimpleNamespacefrom the measurement JSON and callspublish_quant(..., dry_run=True).scripts/g3_push_first_quant.py— the live-push one-shot. Reuses the existing dry-run stage; callsHFHubAdapter(staging_dir=..., dry_run=False).push_folder()directly so the 32 GB of GGUF bytes don’t get re-staged.articles/becoming-a-gguf-publisher-on-spark/— anchor article. Walks the v0.4.x publish surface end-to-end againstOrionfold/finance-chat-GGUFand narrates the two bugs that v0.4.0 fixed before tagging.