
Moonshot AI just gave its flagship open-source coding model a serious shot of espresso. Kimi K2.7 Code, released on June 12, 2026, was already turning heads as a 1-trillion-parameter coding specialist. Three days later, Moonshot dropped a HighSpeed variant that pushes throughput to around 180 tokens per second on typical coding tasks, and up to 260 tokens per second on shorter inputs. That is roughly six times faster than the standard release, and it changes the economics of running large agentic coding pipelines in a meaningful way.
The model underneath the speed
K2.7 Code uses a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion active parameters per token, available on Hugging Face under a Modified MIT License that permits commercial use with attribution. MoE is worth unpacking: instead of one massive dense network, the model is divided into 384 specialized sub-networks called "experts." For each token, a learned router picks a small subset to activate. Only a small fraction of the parameters are engaged for any given token, so you get the knowledge capacity of a trillion-parameter model with the per-token compute closer to a 32 billion parameter dense model.
The model supports a 256K context length and uses Multi-head Latent Attention (MLA). MLA, originally developed by DeepSeek, compresses the key-value cache (the memory structure that grows with conversation length) through learned projections. The MLA attention mechanism compresses the key-value cache using learned latent projections, which means you can actually fit that 256K context window in memory without needing an absurd amount of VRAM. That 256K window is large enough to hold multiple source files, their tests, configuration, and a long back-and-forth conversation all at once.
Kimi K2.7 Code uses a natively multimodal architecture that supports text, image, and video input, in addition to its coding and agentic capabilities. The vision side is handled by a 400M-parameter MoonViT encoder, handling image and video input in the same pipeline as text. In practice, this means you can drop a screenshot of a UI mockup or a recorded bug repro directly into the prompt and get back working code.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
