Microsoft's MAI-Thinking-1 Beats Claude Without Copying a Single Reasoning Trace

EDITORIAL LEADERBOARD

Microsoft AI

Microsoft's MAI-Thinking-1 Beats Claude Without Copying a Single Reasoning Trace

1D AGO

2 min read

REASONING

math_reasoning test_time_compute

OPEN_SOURCE

1 day ago

REASONING

math_reasoning test_time_compute

OPEN_SOURCE

2 min read

Microsoft has shipped its first reasoning model built entirely from scratch inside the company. MAI-Thinking-1 is a sparse Mixture-of-Experts model that the MAI team trained without copying chain-of-thought traces from OpenAI, Anthropic, or anyone else, and the numbers suggest the gamble paid off. It matches Claude Opus 4.6 on a hard software engineering benchmark, scores 97% on AIME 2025, and beats Claude Sonnet 4.6 in blind human comparisons.

This is a notable departure from the standard playbook of distilling a stronger teacher into a smaller student. MAI-Thinking-1 was trained without distillation from third party models, forcing the model to truly learn the tasks at hand. For a company that has spent years routing its consumer and enterprise AI through OpenAI, that is a statement of independence.

A medium model punching at the top weight class

The architecture is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model. In an MoE, only a fraction of the network's experts fire on any given token, so inference cost tracks the active parameter count rather than the total. That means MAI-Thinking-1 runs closer to a 35B-class model in latency and memory while drawing on a trillion-parameter knowledge base.

The headline benchmark is SWE-Bench Pro, which tests whether a model can resolve real GitHub issues end to end. MAI-Thinking-1 lands at 52.8%, which the team says is competitive with Claude Opus 4.6, a much larger frontier model. That matters for developers and enterprises because model size determines where advanced coding assistance can be deployed, how often it can be used, and whether it can move from exceptional tasks into daily workflows.

On the math side, MAI-Thinking-1 hits 97.0% on AIME 2025, and 94.5% on AIME 2026, putting it in the same tier as the dedicated math-reasoning models from the major labs. The training curve below shows the pass rate climbing from a near-random baseline to ceiling over the course of reinforcement learning.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Takeaways

A medium model punching at the top weight class

Don't miss what's next in AI