

Microsoft's in-house image model just cracked the global top three. MAI-Image-2.5, built by the MAI Superintelligence Team, debuted at No. 3 on Arena's text-to-image leaderboard -- the human-preference ranking where real users vote in blind head-to-head comparisons. That puts Microsoft directly behind OpenAI and Google, and well ahead of where the MAI-Image series started.
From ninth to the podium
MAI-Image-2.5 is the third generation of Microsoft's in-house text-to-image model, taking the series from a mid-table position to a spot among the top three on Arena's global leaderboard, behind only OpenAI's gpt-image-2 at rank one and one other model at rank two. That trajectory matters: Microsoft's image generation journey started slowly, with MAI-Image-1 debuting at ninth position on Arena's leaderboard. MAI-Image-2 then debuted at third place, trailing only Google's gemini-3.1-flash-image-preview and OpenAI's gpt-image-1.5-high-fidelity.
MAI-Image-2.5 now holds that same third-place position, but with substantially higher scores across every style category Arena tracks. The overall Arena score climbed from 1182 (MAI-Image-2) to 1254 -- a +75-point improvement. OpenAI's gpt-image-2 currently leads the pack with a score of 1388.
Where the scores actually moved
Arena breaks scores down across eight style categories. Here is how all three generations of the MAI-Image series compare:
| Category | MAI-Image-1 | MAI-Image-2 | MAI-Image-2.5 |
|---|---|---|---|
| Overall | 1093 | 1182 | 1254 |
| Text Rendering | 1070 |
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
