Moonshot AI just shipped Kimi K2.7 Code HighSpeed, a high-throughput inference mode for their latest open-source coding model that pushes generation up to 6x faster than the standard version. The headline numbers: around 180 tokens per second on typical coding tasks, and up to 260 tok/s on shorter-context inputs. For a trillion-parameter model, that is genuinely fast.

The model underneath

To understand why the speed jump matters, you need to know what K2.7 Code is. It is a 1-trillion-parameter Mixture-of-Experts model with 32B active parameters per token, 384 experts, and a 256K-token context window, released under a Modified MIT license. MoE (Mixture-of-Experts) means only a fraction of the model's parameters activate for any given token, which is how a trillion-parameter model stays economically feasible to serve.

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6, with substantial improvements on real-world long-horizon coding tasks, while reducing thinking-token usage by approximately 30% compared with Kimi K2.6. The architecture also includes a MoonViT vision encoder, native INT4 quantization, and Multi-head Latent Attention (MLA).

One behavioral quirk worth knowing: K2.7 Code forces "thinking" and preserve_thinking on, and you cannot turn them off. The model always reasons before answering, and it keeps its full reasoning chain across multi-turn conversations. Moonshot says this "preserve thinking" mode is what boosts performance in coding-agent scenarios where context builds up over many steps.

Better, cheaper, and now faster

The base K2.7 Code model already came with a meaningful efficiency story. Kimi K2.7 Code cuts thinking-token usage by approximately 30% on average compared with K2.6, while achieving higher scores across Kimi Code Bench v2, Program Bench, and MLS Bench Lite. Because thinking tokens bill as output, that 30% reduction is a real cost cut on every agentic task, not just a capability claim.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves