Mixture-of-Experts (MoE) models , architectures where only a small subset of specialized sub-networks, called "experts," activate for each token , have taken over the frontier model landscape. But training them efficiently is a different beast from dense transformers. Routing tokens across hundreds of experts, fusing their computations, and keeping GPUs from stalling on communication overhead requires infrastructure that general-purpose libraries simply weren't built for. NVIDIA's answer is NeMo AutoModel, and it just got a major upgrade by building directly on top of Hugging Face Transformers v5.

The Problem With Training MoE at Scale

Hugging Face Transformers has become the foundation of the open-source AI ecosystem, and the recent Transformers v5 release strengthened it with first-class support for MoE models. v5 ships the MoE foundations: expert backends, dynamic weight loading, and distributed execution. But v5 still leaves a performance gap on the table. NeMo AutoModel builds on top of v5 by subclassing AutoModelForCausalLM, adding Expert Parallelism (EP), DeepEP fused all-to-all dispatch, and TransformerEngine kernels. DeepEP is the piece v5 doesn't have yet: it overlaps communication with expert compute.

Training state-of-the-art MoE models has traditionally required specialists with deep distributed systems knowledge and access to high-end infrastructure. The goal of NeMo AutoModel is to collapse that complexity into a single import swap.

One Import, 3.7x Faster

The payoff is 3.4–3.7x higher training throughput and 29–32% less GPU memory on fine-tuning MoE models than native Transformers v5, using the same from_pretrained() API , a single import line, with no other code changes. The API compatibility is intentional:

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves