Sakana AI's DiffusionBlocks Trains Giant Neural Networks One Block at a Time

EDITORIAL LEADERBOARD

Sakana AI

May 27, 2026

1 min read

TRAINING_INFRA

distributed_training pretraining

OPEN_SOURCE

May 27, 2026

TRAINING_INFRA

distributed_training pretraining

OPEN_SOURCE

1 min read

Training a large neural network has always meant one thing: hold the whole thing in memory at once. Every layer, every activation, every gradient -- all of it resident during the backward pass. DiffusionBlocks, a new framework from Sakana AI and the University of Tokyo presented at ICLR 2026, breaks that assumption. It trains networks one block at a time, and the memory cost scales down proportionally.

Standard neural network training optimizes all parameters jointly, so the memory it requires grows with model size. Block-wise training instead trains each block of the network independently, using memory for just one block at a time. The catch has always been: how do you give each block a meaningful training signal without the rest of the network? That is exactly what DiffusionBlocks solves.

Why memory is the real wall

End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. As Transformers are scaled primarily by adding more layers, depth directly drives the growing memory cost. Today's frontier models typically have hundreds of billions of parameters or more and require thousands of GPUs to train, and only a small number of organizations have the resources to develop them.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Takeaways

Why memory is the real wall

Don't miss what's next in AI