
GitHub just made a quiet but consequential change to its Copilot model picker: Kimi K2.7 Code, an open-weight model from Beijing-based Moonshot AI, is now generally available as a selectable option. It is the first time a model whose full weights are publicly downloadable has appeared in Copilot's picker alongside Claude, GPT, and Gemini. For most developers, the practical change is simple: a new entry in a dropdown. But the implications run deeper than that.
A new kind of model in a familiar place
Every other model in Copilot's picker is proprietary. You cannot download it, audit it, or run it yourself. Kimi K2.7 is different: it is MIT-licensed, the full 1T parameter weights are public on Hugging Face, and GitHub is simply running a hosted copy on Azure for Copilot users who prefer not to manage infrastructure. GitHub's move is distribution: bringing an open-weight 1T-parameter MoE coding model into the same picker where developers already choose Claude, GPT, and Gemini variants.
The Copilot model picker now spans five independent AI labs: OpenAI, Anthropic, Google, Microsoft, and Moonshot AI, making it the only major coding tool in the market that currently routes across five separate AI providers under a single subscription. That happened fast. Moonshot AI published the weights on Hugging Face just 19 days before GitHub shipped them into general availability. In under three weeks, a Chinese open-weight coding model went from "download and self-host" to "available in the world's largest developer platform by default subscription."
What the model actually is
Kimi K2.7 Code is built on a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion activated parameters per token. The model supports a 256K context length and uses Multi-head Latent Attention (MLA). It also includes MoonViT, a 400M-parameter vision encoder.
MoE is the key architectural idea here. In a standard dense model, every single parameter activates for every token processed. MoE breaks that: it routes each token through only a small subset of specialized "expert" sub-networks. Kimi K2.7 Code has 1 trillion total parameters but only 32 billion active per token. That means you get the knowledge capacity of a massive model at a fraction of the inference cost per call.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
