Nvidia releases Nemotron 3 Ultra for long-running open agents
The 550B-parameter mixture-of-experts model ships with open checkpoints, a million-token context window, and efficiency claims aimed at production agent workloads.
Read more
Nvidia released Nemotron 3 Ultra, the largest model in its Nemotron 3 family, for long-running reasoning and agentic workloads. The 550-billion-parameter mixture-of-experts model activates about 55 billion parameters per token and supports a one-million-token context window. Nvidia published multiple checkpoints, training data, recipes, and reasoning-budget controls under an open license, including an NVFP4-quantized version intended to improve inference economics. The company reports substantially higher throughput than several large open competitors in long-output tests, but those vendor benchmarks still need independent validation across real workloads. The release matters because Nvidia is using open models to pull developers toward its broader inference stack while addressing a practical agent problem: maintaining capable reasoning without making every long-running task prohibitively expensive.
Key details: June 4, 2026, 550B total parameters, About 55B active parameters, 1M-token context window, NVFP4 checkpoint, Open checkpoints, data, and recipes.
Continue swiping for more AI Brief stories.