

Google DeepMind just released Quantization-Aware Training (QAT) checkpoints for the entire Gemma 4 family, and the numbers are hard to ignore: the flagship 26B model now fits on a 16GB laptop, and the smallest E2B variant runs in under 1GB on a phone. This is not a minor tweak. It is a rethinking of how compression should be done for large language models targeting edge hardware.
The problem with shrinking models the old way
Every time you run a large model locally, you are fighting memory. The standard fix has been Post-Training Quantization (PTQ): take a fully trained model and round its weights down from 16-bit floats to 4-bit integers after the fact. PTQ takes a fully trained model and compresses its weights into a lower-precision format after training, but accuracy often degrades because the model never learned to compensate for the quantization noise. The quality loss is real and measurable. Traditional PTQ loses 10-15% quality compared to the full-precision baseline.
QAT (Quantization-Aware Training) takes a fundamentally different approach. Instead of simply quantizing the model after training, QAT integrates the quantization process directly into training. Concretely, during training, the model simulates the quantization step in its forward pass, so it learns weights that are robust to the rounding error introduced by lower precision. The model is not surprised by compression at deployment time because it was trained to expect it.
What's actually in the release
The Gemma 4 models optimized with QAT are available in five sizes: Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B, and Gemma 4 31B. Two distinct formats are being shipped:
- Q4_0 GGUF checkpoints for all five sizes, ready for use with llama.cpp, Ollama, and LM Studio. These are the go-to format for consumer GPU and laptop deployments.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
