Google Ships Gemini 3.5 Live Translate Across 70 Languages to Developers

Google for Developers

Google Ships Gemini 3.5 Live Translate Across 70 Languages to Developers

1D AGO

2 min read

1 day ago

2 min read

Real-time voice translation has been a demo-stage promise for years. Google just made it a developer primitive. Gemini 3.5 Live Translate is now available in the Gemini Live API in public preview, letting you wire continuous speech-to-speech translation directly into your own applications across more than 70 languages and 2,000+ language pair combinations.

One model, not a pipeline

The key architectural shift here is that Gemini 3.5 Live Translate is an audio model built on Gemini 3 Pro that streams speech-to-speech translation in over 70 languages, auto-detecting the source language and generating output that preserves the speaker's intonation, pacing, and pitch. That last part matters more than it sounds.

Traditional translation pipelines chain three separate models: a speech-to-text (STT) engine, a machine translation model, and a text-to-speech (TTS) synthesizer. Each hop adds latency and accumulates errors. Gemini 3.5 Live Translate collapses all three into a single audio-to-audio model. Unlike older systems that wait for a speaker to finish before translating, it processes speech as it flows, staying just a few seconds behind, and preserves intonation, tempo, and vocal pitch so the translated voice sounds closer to how the original speaker actually sounds.

On the technical side, the model accepts audio input with up to a 128K-token context window and produces audio and text output up to 64K tokens. The developer API ingests raw 16-bit PCM audio at 16 kHz mono, chunked at 100-millisecond intervals, and returns 24 kHz mono PCM output.

What the numbers actually say

Google describes the lag as "a few seconds behind the speaker." Independent benchmarking by LiveLingo Research, which measured the raw API endpoint across 120 test utterances and four language pairs, recorded a median first-audio latency of roughly 2,947 ms. That is a consistent ~3-second speaking delay, which matches Google's framing. Comprehension fidelity scored 4.93 out of 5 across English-to-Spanish, English-to-Chinese, English-to-Japanese, and English-to-German pairs, the strongest result among the competing systems tested.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Google Ships Gemini 3.5 Live Translate Across 70 Languages to Developers

Takeaways

One model, not a pipeline

What the numbers actually say

Don't miss what's next in AI