Google Ships Gemini 3.5 Live Translate Across 70 Languages in Near Real-Time

EDITORIAL LEADERBOARD

Google DeepMind

3H AGO

2 min read

AUDIO

realtime_voice speech_to_text

API

3 hrs ago

AUDIO

realtime_voice speech_to_text

API

2 min read

Google just shipped Gemini 3.5 Live Translate, a dedicated audio model that converts live speech into over 70 languages in near real-time. It is available right now in the Google Translate app on Android and iOS, in public preview via the Gemini Live API and Google AI Studio, and rolling out in private preview for enterprise Google Meet customers.

This is not a wrapper around a general-purpose model. Google built 3.5 Live Translate as a purpose-built audio pipeline, separate from the conversational Gemini Live agents, optimized for one thing: getting translated speech out as fast and naturally as possible.

The problem with every other translator

Traditional speech translation systems work in three sequential steps: transcribe the audio, translate the text, then synthesize new speech. Each step adds latency, and the whole pipeline waits for the speaker to finish a sentence before doing anything. The result is the familiar awkward pause-and-burst pattern that makes real conversations feel robotic.

Unlike turn-by-turn systems that wait for the speaker to finish speaking before responding, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker. The model stays just a few seconds behind the speaker throughout the session, rather than accumulating a growing lag.

The other big problem was voice identity. Most translation systems produce a generic synthetic voice regardless of who is speaking. The model automatically detects 70+ languages and generates smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch. That means a speaker's excitement, hesitation, or emphasis carries through into the translated output.

How it actually works

Under the hood, the model operates as a streaming audio-to-audio pipeline. The Gemini Live API supports low-latency, real-time speech-to-speech translation between 70+ languages using the

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Takeaways

The problem with every other translator

How it actually works

Don't miss what's next in AI