Google's Gemini 3.5 Live Translate Replaces 3-Model Pipelines With One Audio Model

Google for Developers

Google's Gemini 3.5 Live Translate Replaces 3-Model Pipelines With One Audio Model

7D AGO

2 min read

AUDIO

realtime_voice speech_to_text

API

7 days ago

AUDIO

realtime_voice speech_to_text

API

2 min read

Real-time voice translation has been a demo-tier feature for years: clunky, turn-based, and robotic-sounding. Gemini 3.5 Live Translate is Google's attempt to make it production-ready. The model is now in public preview via the Gemini API, and it works fundamentally differently from anything that came before it.

One model, no pipeline

The old approach to live translation was a chain of three separate systems: a speech-to-text model, a translation model, and a text-to-speech synthesizer. Each hop adds latency and a place for errors to compound , a mistranscription becomes a mistranslation becomes a confidently wrong spoken sentence.

Gemini 3.5 Live Translate folds these steps into one audio model. It is a low-latency, audio-to-audio model optimized for real-time translation of spoken conversations, enabling seamless, bidirectional translation with high accuracy and natural voice output. The result is a system that can translate continuously while the speaker is still talking, rather than waiting for them to finish.

Google Meet interface showing a live translation session between two participants

Unlike turn-by-turn systems that wait for the speaker to finish speaking before responding, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker. It preserves intonation, tempo, and vocal pitch , so the translated voice sounds closer to how the original speaker actually sounds, rather than flat synthesized audio.

What's available, and where

Gemini 3.5 Live Translate is rolling out across Google products: for developers in public preview via the Gemini Live API and Google AI Studio; for enterprises in private preview starting this month in Google Meet; and for everyone via Google Translate on Android and iOS.

The scope of the Google Meet upgrade is notable. Speech translation in Google Meet will offer 70+ languages, an improvement from the previous limit of just five languages, and will enable conversations across over 2,000+ language combinations in one meeting, expanding from the previous state of only translating to and from English.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Google's Gemini 3.5 Live Translate Replaces 3-Model Pipelines With One Audio Model

Takeaways

One model, no pipeline

What's available, and where

Don't miss what's next in AI