Nous Research has joined NVIDIA's Nemotron Coalition and is celebrating by giving everyone free access to NVIDIA's most powerful open model yet. The deal is simple: sign up for the Nous Portal, connect Hermes Agent, and you get two weeks of Nemotron 3 Ultra at zero cost. For a model of this caliber, that's a meaningful on-ramp.

The model behind the headline

Nemotron 3 Ultra is a 550 billion parameter (55B active) open model from NVIDIA built for long-running, agentic workflows with fast and affordable performance across hundreds of tool calls. The architecture is a hybrid Mamba-Transformer Mixture-of-Experts (MoE) design, meaning only 55B of those 550B parameters fire on any given token, keeping inference costs manageable even at this scale.

The context window is one of the standout specs. The headline specs include a 1-million-token context window and native support for multi-token prediction. Mamba-2 layers handle long-range sequence dependencies with linear time complexity, replacing a subset of the attention layers that would otherwise make 1M-token context inference prohibitively expensive. In practice, that means you can feed entire codebases, long tool histories, or research trails into a single context without losing the thread.

The numbers that matter

Nemotron 3 Ultra went live on Hugging Face on June 4, 2026. It scores 48 on the Artificial Analysis Intelligence Index , the highest of any US-built open-weight model ever released , and runs at over 300 tokens per second, three to six times faster than comparable Chinese models available through commercial APIs today.

  • 550B total / 55B active parameters via MoE routing
  • 1M token context (NVFP4 quantized on Blackwell), 262k in BF16
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves