Microsoft just launched MAI-Voice-2, its most expressive text-to-speech model to date, and the numbers make a strong case. In side-by-side listening tests, 45.5% of listeners preferred the generated speech, 44% preferred the real human recording, and 10.5% called it a tie , effectively a coin flip between synthetic and human voice. That is not a benchmark you see often in TTS.

From English-only to a global voice stack

MAI-Voice-2 is described as the most expressive, natural-sounding text-to-speech model Microsoft has built to date, representing a significant leap from its predecessor across fidelity, language coverage, speaker consistency, and emotional range. The previous version was English-only. MAI-Voice-2 now supports 15 languages and 18 locales, including Arabic, Chinese, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, and Vietnamese.

Critically, Microsoft is not just bolting on language support as an afterthought. They prioritized depth across 15 languages, ensuring that supported languages cover a spectrum of expressive capabilities spanning tonal, pitch accent, stress-timed, and syllable-timed systems. That means the model was built to handle the structural differences between, say, Mandarin (tonal) and German (stress-timed) rather than forcing all languages through an English-shaped prosody mold.

The controls developers actually want

The biggest practical upgrade is granular emotion control. MAI-Voice-2 exposes emotion through tags , sad, whispered, excited, and others , so tone becomes an input you set rather than a property you hope the model infers. This matters more than it sounds: most production voice failures are not pronunciation errors; they are tone mismatches. A support assistant that sounds cheerful while delivering bad news reads as worse than a flat text reply.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves