On-device text-to-speech has long been a game of painful tradeoffs: shrink the model enough to fit on a phone, and quality craters. Gradium's latest Phonon update is making a strong case that this tradeoff is no longer inevitable. The company's 100M-parameter model now reaches 1.00% Word Error Rate (WER) on the Seed-TTS English benchmark, outperforming every on-device competitor it was tested against, all of which are significantly larger.

A benchmark that actually matters for edge deployment

WER, in this context, measures how accurately synthesized speech can be transcribed back to the original text. Lower is better. Results are reported on the Seed-TTS English test set, with generated audio transcribed using Whisper large-v3 and compared to input text using jiwer with text normalization. Speaker similarity is measured as the cosine distance between WavLM-large embeddings of the reference and generated audio. This is the same evaluation protocol used across the broader TTS research community, making comparisons meaningful.

Seed-TTS-Eval is an objective benchmark for zero-shot TTS and voice conversion evaluation, using out-of-domain English and Mandarin samples from Common Voice and DiDiSpeech-2, with the official evaluation focusing on intelligibility and speaker consistency. Passing it well, at 100M parameters, is a different kind of achievement than passing it with a 1.5B model running on a data center GPU.

The numbers, and why size is the real story

Phonon now reaches 1.00% WER on the Seed-TTS English benchmark with voice cloning enabled, outperforming NeuTTS Air (552M), KaniTTS2 (450M), and NeuTTS Nano (229M). With voice cloning disabled and a fixed high-quality voice, Phonon drops to 0.83% WER, ahead of Kokoro and Magpie.

Here is how the full comparison looks with voice cloning enabled:

ModelParametersWERSpeaker Similarity
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves