Moonshot AI's Kimi-K2.7-Code Slashes Reasoning Costs by 30% on 1T Open-Weight Model

Kimi Developers

Moonshot AI's Kimi-K2.7-Code Slashes Reasoning Costs by 30% on 1T Open-Weight Model

3D AGO

2 min read

3 days ago

2 min read

Moonshot AI just shipped Kimi-K2.7-Code, and the headline isn't the benchmark numbers -- it's the token efficiency. The Beijing-based company claims the model cuts reasoning token usage by 30% compared to its predecessor, meaning developers burn through fewer compute resources while getting better results. For teams running coding agents at scale, that's a cost story, not just a capability story.

Kimi-K2.7-Code is one of the largest open-weight coding models you can download right now, packing 1 trillion total parameters, activating 32 billion of them per token, running a 256K-token context window, and shipping with open weights on Hugging Face under a Modified MIT license. The model is live on Moonshot AI's Kimi platform APIs and hosted on Hugging Face.

A model built for the long haul

This is a coding-first, agentic model, built to plan, edit files, run tools, and debug across many steps rather than to chat. The distinction matters. Most LLMs are optimized for single-turn quality. K2.7-Code is optimized for finishing things -- whole features, whole refactors, whole debugging sessions.

Real-world software engineering rarely ends in a single step. Tasks like refactoring a codebase, implementing a feature across multiple files, or debugging over long agent sessions require a model to follow instructions reliably across extended contexts, and to carry a task through to completion. That's the problem K2.7-Code is explicitly designed to solve.

The architecture under the hood

K2.7-Code is a Mixture-of-Experts model -- an architecture where a large pool of specialized sub-networks ("experts") exists, but only a small subset activates for any given input. It holds 1T total parameters and activates 32B per token, using 384 experts with 8 selected per token and 1 shared. This keeps inference costs manageable despite the massive parameter count.

Attention uses MLA (Multi-head Latent Attention), and the feed-forward path uses SwiGLU. A MoonViT vision encoder adds 400M parameters for image and video input. Unusually for a coding-first model, K2.7-Code accepts text, image, and video through that vision encoder -- so you can drop in a screenshot of a UI bug, or hand it a recorded reproduction and a stack trace in one prompt.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Moonshot AI's Kimi-K2.7-Code Slashes Reasoning Costs by 30% on 1T Open-Weight Model

Takeaways

A model built for the long haul

The architecture under the hood

Don't miss what's next in AI