Tag

#nemotron

Articles tagged "nemotron" — 8 entries.

Article №16 foundations Foundation ~25 minute read
Looking Beyond Spark

Looking Beyond Spark — Fine-Tuning a 100B Nemotron

A working answer to: how many GPUs to fine-tune a 100B Nemotron? Three methods, three memory footprints — full FT ≈ 1.6 TB needs 24× H100; LoRA ≈ 250 GB fits 8× H100; QLoRA ≈ 65 GB fits 1× H200. The Spark's 3B LoRA teaches the math.

uses fieldkit.capabilities

Article №10 inference Llama 3.3 70B + Nemotron-Super-49B + Llama 3.1 8B NIM ~30 minutes on top of the rerank-and-fusion chain
Foundations

Bigger Generator, Same Grounding — 8B vs 49B vs 70B on One Retrieval Chain

The rerank-and-fusion article bet that a bigger generator would heal the 8B Google-IPO refusal. Ran the A/B across three sizes on one retrieval chain. Bet lost: Nemotron-Super-49B over-refuses the 8B baseline; Llama 3.3 70B narrows the gap, not closes it. The refusal was the scaffold working.

uses fieldkit.rag

Article №09 inference Nemotron Reranker + pgvector full-text + Llama 3.1 8B NIM ~45 minutes on top of the naive-RAG chain
Foundations

Hybrid Retrieval on the Spark — BM25, Dense, Fusion, Rerank

Four retrieval modes on one corpus — naive dense, BM25, Reciprocal Rank Fusion, Nemotron rerank. Dense is already 92% recall@5; rerank adds a point at K=10 and reorders the top. The 8B generator still refuses where retrieval is perfect — grounding, not retrieval, is the new bottleneck.

uses fieldkit.rag

Article №08 inference Llama 3.1 8B NIM + Nemotron Retriever + pgvector ~30 minutes if the three endpoints are already warm
Foundations

Three Endpoints, One Answer — Naive RAG on a DGX Spark

Three endpoints in one curl chain — a query embeds through Nemotron, pgvector returns top-5 chunks in under 80 ms, and a Llama 3.1 8B NIM stuffs them into a strict-context prompt. The chain works; the 8B generator still refuses on questions its own context answers.

uses fieldkit.ragfieldkit.eval

Article №07 inference pgvector ~15 minutes first install, re-runs in seconds
Foundations

Where Your Vectors Live — pgvector on a DGX Spark

The substrate between the embed call and the retrieve call — pgvector 0.8.2 running as a Postgres 16 container on GB10, with 1000 Nemotron vectors, HNSW and ivfflat both indexed, and a planner that prefers seq scan until you tell it otherwise.

uses fieldkit.rag

Article №06 inference NeMo ~30 minutes first install, ~1 minute every restart after
Foundations

Your Own Semantic Space — a Nemotron Embedding NIM on a DGX Spark

The embedding endpoint that every downstream RAG, wiki, and agent piece will reuse — a 2048-dim Nemotron Retriever NIM running locally on GB10, ready 52 seconds after docker run and holding 28 docs/s under batched load.

uses fieldkit.rag

Article №04 agentic NemoClaw ~2 hours after prerequisites

The Sandbox Tax That Wasn't — NemoClaw vs OpenClaw on One DGX Spark

I ran NemoClaw's sandboxed agent stack and the host Ollama-OpenClaw CLI side by side on one DGX Spark with the same 123B Nemotron model. The sandbox overhead I went looking for is real but modest (~2× raw inference); the real tax is onboarding, and NemoClaw paid it at install time.

Upcoming fine-tuning NeMo Customizer + Nemotron Nano 9B v2 planned ~4 hours per sweep
LLM Wiki

LoRA on Nemotron Nano — Fine-tuning a 9B Without Blowing Unified Memory

A planned walk through LoRA fine-tuning on Nemotron Nano 9B with NeMo Customizer: rank and alpha sweeps, a tiny domain corpus, and the memory accounting that keeps a PEFT run from tripping the Spark's 128 GB unified-memory wall.