Tag
#evals
Articles tagged "evals" — 1 entry.
Machine that Builds Machines
Claw-Eval-Live on Spark — Spark reproduction notes
Stand up Claw-Eval-Live sandboxed-workflow protocol on Spark via NemoClaw + OpenShell, mock the business-service backends, run Llama 8B vs Nemotron 49B with deterministic-trace + LLM-judge grading, and chart where local agents land vs the paper 66.7 percent ceiling.