← AI Native Field Notes
Reading
Font size
Line height
Reader theme
Explainers
Settings save to this browser only.
Heterogeneous Scientific Foundation Model Collaboration — Spark reproduction notes
Wrap a domain foundation model (Pangu-Weather) as a Triton tool, drive it from a NIM-served Llama 3.1 8B planner via NemoClaw, and show when specialist routing beats language-only reasoning — all inside the Spark 128 GB envelope.
Series Machine that Builds MachinesSource paper
- arXiv: 2604.27351 — Heterogeneous Scientific Foundation Model Collaboration
- Repo: (none at promotion time — paper introduces “Eywa” framework but no code released)
- Popularity: 41/100 · 181 HF upvotes · 0 citations
Frontier Scout verdict
spark-feasible — every component fits comfortably in the 128 GB envelope (8B planner + ≤3 specialists + Guardrails ≤ 50 GB), and the agent shell maps directly onto NemoClaw’s existing tool-routing protocol; the only friction is reconstructing the framework from the paper’s prose since code isn’t out yet.
Proposed Spark recipe
- Pick a concrete domain for the first walkthrough — weather forecasting via Pangu-Weather (~3B) is the cleanest demo because input/output is tensor-on-disk rather than tokens.
- Serve the planner LLM via NIM:
llama-3.1-8b-instructcontainer exposes the OpenAI-compatible chat endpoint already documented innim-first-inference-dgx-spark.NIM_GPU_MEM_FRACTION=0.4to reserve room for the specialist. - Serve the specialist FM via Triton with a Python backend wrapping the model’s native inference (
trtllm-and-triton-on-sparkshows the pattern). Expose it aspredict_weather(state_tensor) -> forecast_tensor. - Build the EywaAgent loop inside NemoClaw as a custom skill: planner receives a question, decides whether to call the specialist, marshals the structured inputs, gets back a tensor, and renders a natural-language summary.
- Add NeMo Guardrails at the planner boundary so off-domain prompts route to a refusal rather than calling the specialist with garbage inputs.
- Measure two things: planner-only vs Eywa accuracy on a held-out domain question set (paper claims improvement on structured-data tasks), and end-to-end latency budget.
Full recipe with stack-map references in evidence/spark-recipe.md.
Open questions for the experiment
- No public repo — Eywa framework must be reimplemented from the paper’s prose. ~200–400 lines of Python expected.
- Specific scientific FMs the paper evaluates aren’t enumerated — pick Pangu-Weather (or ESM-2 for proteins) as the canonical specialist and call out the deviation.
- No accuracy numbers in the abstract to anchor the “is orchestration helping” claim until the full paper or a code release arrives.
Suggested article shape
- Stage: agentic
- Series: Autoresearch
- Tags: agentic, tool-use, foundation-models, multimodal, nemoclaw, nim, triton, guardrails
- Voice: essay on when language-only reasoning hits a structural wall and what specialist routing buys you. Tie back to the Autoresearch arc — Eywa is one pattern for an autoresearch agent that reaches beyond text.