Cerebras Runs Kimi K2.6 at 981 Tokens/s, Crushing Gemini 3.5 Flash

Cerebras

Jun 05, 2026

2 min read

Jun 05, 2026

2 min read

Google made an unusual bet at I/O 2026. Instead of launching a new flagship model centered on intelligence, it launched Gemini 3.5 Flash , a model designed first and foremost for speed. That framing is a tell: the inference speed race is now a first-class product dimension. Earlier this year both OpenAI and Anthropic launched high-speed variants of their leading models, priced 3x higher than the base model. Google has now joined them, making speed the headline feature rather than an afterthought. Cerebras decided to run a head-to-head.

Two near-frontier models, one clear winner on speed

Kimi K2.6 is a one-trillion-parameter Mixture-of-Experts (MoE) model from Moonshot AI, with 32 billion parameters active per token. It is the leading open-weight model among highly capable peers including MiMo V2.5, DeepSeek V4, and GLM-5.1, and is especially popular for coding , notably used as the base model for Cursor's Composer 2.5. MoE means the model routes each token through a small subset of specialized sub-networks ("experts") rather than running the full parameter count every time, which keeps compute costs manageable at trillion-parameter scale.

Gemini 3.5 Flash and Kimi K2.6 make for an ideal comparison pair as they both belong in the class of near-frontier models. On the Artificial Analysis Intelligence Index , a composite of ten benchmarks , the two models are neck and neck, scoring 53.9 (Kimi K2.6) and 55.3 (Gemini 3.5 Flash). On coding specifically, K2.6 tops SWE-Bench Pro at 58.6, outperforming Claude Opus 4.6 and matching GPT-5.4, while leading the field on agentic benchmarks like Humanity's Last Exam and DeepSearchQA.

The numbers that matter

Gemini 3.5 Flash achieves 181 tokens/s on Artificial Analysis's standard benchmark (10,000-token input), significantly faster than Claude Opus 4.8 and GPT-5.5 in the 60 tokens/s range. That already makes it the fastest closed model on the market. Then there's Cerebras.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Cerebras Runs Kimi K2.6 at 981 Tokens/s, Crushing Gemini 3.5 Flash

Takeaways

Two near-frontier models, one clear winner on speed

The numbers that matter

Don't miss what's next in AI