Cohere just made its first open-source coding model significantly easier to run. North Mini Code 1.0 launched a week ago as a 30B-parameter Mixture-of-Experts model, and today Cohere is shipping three updates that push it onto consumer hardware: a 4-bit quantized build that fits on a Mac, free access through the OpenRouter API, and native support on Ollama.

What North Mini Code actually is

North Mini Code is the first model in Cohere's new "North" family, specifically designed and trained for agentic software engineering tasks. It routes each query to a small subset of specialized "expert" networks within the larger model, with 30 billion parameters total but only 3 billion active at any given time, keeping inference costs dramatically lower than a dense 30B model would require.

It is a meaningful departure: Cohere's prior models, Command R and Aya, were not coding-focused, and the company had not previously shipped an open-weight coding model. The Apache 2.0 license is about as permissive as open-source gets, meaning companies can modify, deploy, and even commercialize the model without licensing headaches.

Three new ways to run it

The headline update is the W4A16 quantized checkpoint. This uses 4-bit weights with 16-bit activations, shrinking the memory footprint to roughly 18-20GB , small enough to run on a Mac Studio via MLX. Cohere co-founder Nick Frosst demoed it on a Mac Studio via MLX at roughly 20 GB of RAM. The two other updates are:

  • OpenRouter: North Mini Code is available on OpenRouter for free, letting you call it via a standard OpenAI-compatible API without managing any infrastructure.
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves