Tag

#nemotron

Articles tagged "nemotron" — 12 entries.

Article №56 fine-tuning NeMo ~16 min read — synthesis of a two-day advisor build on one Spark
Machine that Builds Machines

The Refusal Floor Is Trainable — What a Frozen Curveball Proved About Prompts vs Weights

A 30B model with a hand-tuned prompt contract refused 3 of 9 adversarial pretexts and fabricated private-looking state 3 times. A 4B trained for 21 minutes refused 9 of 9. The bench that saw the difference was frozen before training — and that discipline is the whole method.

uses fieldkit.arenafieldkit.eval

Article №45 agentic NIM ~1 hour, most of it the NIM's first cold-start
Harnesses

The Hermes Harness on a DGX Spark — A Local Cockpit That Holds Tools, With No API Key

Installing the Hermes agent harness on a DGX Spark and running the first local agent turn against the cached Nemotron-Nano-9B-v2 NIM — reliable tool calls, no API key, no cloud hop. The defensible angle is NIM-first; everyone else's Spark Hermes write-up leads with Ollama.

uses fieldkit.nimfieldkit.capabilitiesfieldkit.harness

Article №43 fine-tuning Foundation ~1 hour (one container, six gates, two GGUFs)
Machine that Builds Machines

Unsloth on the Spark — When the Train-Time Peak Equals the Base-Load Peak

Six gates clear in one container against the v1 reset: pip install --no-deps preserves the s40 stack, FastLanguageModel loads at 16.94 GB peak, a 100-step LoRA train holds the same envelope, save_pretrained_gguf() emits both quants in 207 seconds end-to-end.

Article №16 foundations Foundation ~25 minute read
Looking Beyond Spark

Looking Beyond Spark — Fine-Tuning a 100B Nemotron

A working answer to: how many GPUs to fine-tune a 100B Nemotron? Three methods, three memory footprints — full FT ≈ 1.6 TB needs 24× H100; LoRA ≈ 250 GB fits 8× H100; QLoRA ≈ 65 GB fits 1× H200. The Spark's 3B LoRA teaches the math.

uses fieldkit.capabilities

Article №10 inference Llama 3.3 70B + Nemotron-Super-49B + Llama 3.1 8B NIM ~30 minutes on top of the rerank-and-fusion chain
Foundations

Bigger Generator, Same Grounding — 8B vs 49B vs 70B on One Retrieval Chain

The rerank-and-fusion article bet that a bigger generator would heal the 8B Google-IPO refusal. Ran the A/B across three sizes on one retrieval chain. Bet lost: Nemotron-Super-49B over-refuses the 8B baseline; Llama 3.3 70B narrows the gap, not closes it. The refusal was the scaffold working.

uses fieldkit.rag

Article №09 inference Nemotron Reranker + pgvector full-text + Llama 3.1 8B NIM ~45 minutes on top of the naive-RAG chain
Foundations

Hybrid Retrieval on the Spark — BM25, Dense, Fusion, Rerank

Four retrieval modes on one corpus — naive dense, BM25, Reciprocal Rank Fusion, Nemotron rerank. Dense is already 92% recall@5; rerank adds a point at K=10 and reorders the top. The 8B generator still refuses where retrieval is perfect — grounding, not retrieval, is the new bottleneck.

uses fieldkit.rag

Article №08 inference Llama 3.1 8B NIM + Nemotron Retriever + pgvector ~30 minutes if the three endpoints are already warm
Foundations

Three Endpoints, One Answer — Naive RAG on a DGX Spark

Three endpoints in one curl chain — a query embeds through Nemotron, pgvector returns top-5 chunks in under 80 ms, and a Llama 3.1 8B NIM stuffs them into a strict-context prompt. The chain works; the 8B generator still refuses on questions its own context answers.

uses fieldkit.ragfieldkit.eval

Article №07 inference pgvector ~15 minutes first install, re-runs in seconds
Foundations

Where Your Vectors Live — pgvector on a DGX Spark

The substrate between the embed call and the retrieve call — pgvector 0.8.2 running as a Postgres 16 container on GB10, with 1000 Nemotron vectors, HNSW and ivfflat both indexed, and a planner that prefers seq scan until you tell it otherwise.

uses fieldkit.rag

Article №06 inference NeMo ~30 minutes first install, ~1 minute every restart after
Foundations

Your Own Semantic Space — a Nemotron Embedding NIM on a DGX Spark

The embedding endpoint that every downstream RAG, wiki, and agent piece will reuse — a 2048-dim Nemotron Retriever NIM running locally on GB10, ready 52 seconds after docker run and holding 28 docs/s under batched load.

uses fieldkit.rag

Article №04 agentic NemoClaw ~2 hours after prerequisites

The Sandbox Tax That Wasn't — NemoClaw vs OpenClaw on One DGX Spark

I ran NemoClaw's sandboxed agent stack and the host Ollama-OpenClaw CLI side by side on one DGX Spark with the same 123B Nemotron model. The sandbox overhead I went looking for is real but modest (~2× raw inference); the real tax is onboarding, and NemoClaw paid it at install time.

Upcoming agentic Foundation planned ~2 hours
Harnesses

Field-Fixing the Hermes Harness on a DGX Spark — When the NIM Won't Stream Tool Calls, and Other Rough Edges

Fifth in the Harnesses series: the field fixes that take a fresh Hermes agent on a local NIM from 'mostly works' to 'just works.' Leads with the one that bit hardest — the Spark NIM ships a non-streaming tool parser, fixed by bind-mounting NVIDIA's own streaming parser.

uses fieldkit.harness

Upcoming fine-tuning NeMo Customizer + Nemotron Nano 9B v2 planned ~4 hours per sweep
LLM Wiki

LoRA on Nemotron Nano — Fine-tuning a 9B Without Blowing Unified Memory

A planned walk through LoRA fine-tuning on Nemotron Nano 9B with NeMo Customizer: rank and alpha sweeps, a tiny domain corpus, and the memory accounting that keeps a PEFT run from tripping the Spark's 128 GB unified-memory wall.