Cerebras Runs Google's Gemma 4 31B at 1,800 Tokens per Second

Cerebras

2H AGO

2 min read

2 hrs ago

2 min read

Cerebras just crossed two milestones at once. Gemma 4 31B is the first Google DeepMind model they have brought to the platform, and the first to let developers feed images into a model running at wafer-scale speed. The result is something the inference market hasn't seen before: a capable multimodal model that responds fast enough to feel interactive.

What just shipped

Gemma 4 31B is available today on the Cerebras Inference Cloud in public preview. It runs at over 1,800 tokens per second on Cerebras Inference, making it the world's fastest multimodal model. For context, Claude Haiku runs at roughly 100 tokens per second, making this a 15x speedup against one of the most popular production-grade models on the market, at roughly equivalent quality.

Gemma 4 31B is comparable to Claude Haiku 4.5 in intelligence, scoring 29 and 30 respectively on the Artificial Analysis Intelligence Index. The key difference is that Gemma 4 is open-weight under Apache 2.0, and on Cerebras it runs 18x faster than Haiku.

The model itself

Gemma 4 is Google DeepMind's most intelligent open model family, built from Gemini 3 research. The 31B is the flagship and most capable model in the family, a dense multimodal model built for quality and efficiency rather than raw parameter count. Dense models achieve high model intelligence without the large memory footprint of MoE (Mixture-of-Experts) models. MoE is an architecture where only a subset of the model's parameters activates per request, trading some quality for speed and lower memory use.

The 31B model currently ranks as the #3 open model in the world on the industry-standard Arena AI text leaderboard, and outcompetes models 20x its size. On reasoning benchmarks, it delivers 89.2% on AIME 2026, 85.2% on MMLU Pro, 80.0% on LiveCodeBench v6, and 84.3% on GPQA Diamond.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Cerebras Runs Google's Gemma 4 31B at 1,800 Tokens per Second

Takeaways

What just shipped

The model itself

Don't miss what's next in AI