The transformer has ruled language modeling for years, but a new class of architectures is mounting a serious challenge: hybrid models that mix traditional attention layers with linear recurrent layers (modern RNNs). The question is no longer whether hybrids can match transformers on leaderboards , they can , but why, and on which specific tasks. Ai2 just published a technical report that answers this at the finest possible granularity: the individual token level.

The field is already moving

Hybrid language models , architectures that mix transformer attention with linear recurrent layers , have been gaining momentum across the field, with recent efforts from projects like Samba, Nemotron-H, Qwen3-Next, Kimi Linear, and Qwen 3.5. These models have been trained at scales up to 9B active parameters and 36T tokens with encouraging results. Yet despite the momentum, a fundamental question has gone unanswered: what, exactly, does each architectural component contribute to the model's predictions?

Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks and hybrid models that mix recurrence and attention , yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. Ai2's new study is a direct attempt to resolve that.

A controlled experiment, token by token

The key to this study's credibility is the experimental setup. Ai2 compared OLMo 3 (a pure transformer) and OLMo Hybrid (which swaps 75% of attention layers for Gated DeltaNet recurrent layers) in a head-to-head evaluation. The hybrid uses a 3:1 hybridization ratio, replacing the sliding-window attention layers from OLMo 3 with Gated DeltaNet layers. Because both models were built to be as alike as possible outside their architectures , matched on data, tokenizer, and training recipe , any difference in their predictions mostly reflects the architecture itself.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves