AI Brief

Loading

Nvidia releases Nemotron 3 Ultra for long-running open agents

The 550B-parameter mixture-of-experts model ships with open checkpoints, a million-token context window, and efficiency claims aimed at production agent workloads.

Read more

Nvidia released Nemotron 3 Ultra, the largest model in its Nemotron 3 family, for long-running reasoning and agentic workloads. The 550-billion-parameter mixture-of-experts model activates about 55 billion parameters per token and supports a one-million-token context window. Nvidia published multiple checkpoints, training data, recipes, and reasoning-budget controls under an open license, including an NVFP4-quantized version intended to improve inference economics. The company reports substantially higher throughput than several large open competitors in long-output tests, but those vendor benchmarks still need independent validation across real workloads. The release matters because Nvidia is using open models to pull developers toward its broader inference stack while addressing a practical agent problem: maintaining capable reasoning without making every long-running task prohibitively expensive.

Key details: June 4, 2026, 550B total parameters, About 55B active parameters, 1M-token context window, NVFP4 checkpoint, Open checkpoints, data, and recipes.

Continue swiping for more AI Brief stories.

Original

Profile

Your reading trail

Give Feedback

Saves are local on this device.

0 Saved
0 Opened

Saved stories

Unsigned saves stay on this device. Sign in with Google to sync saved stories across devices.