Gemini 3.5 Live Translate
Google’s end-to-end speech-to-speech translation audio model (announced 2026-06), part of the gemini family (llm-providers-wiki, cross-wiki). It listens to spoken input, translates, and generates natural speech in the target language — not text translation. A proprietary, closed-frontier entrant that fuses the spoke’s STT and TTS branches into one real-time pipeline.
Capabilities
- 70+ languages, “2000+ language combinations.”
- Voice preservation: maintains the speaker’s intonation, pacing, and pitch across languages.
- Near real-time, simultaneous: processes streamed speech continuously (not turn-based), staying “just a few seconds behind the speaker,” balancing waiting-for-context vs. translate-immediately — i.e. a simultaneous-interpretation latency/quality trade-off (cf. tts-benchmarks latency axis).
- Multilingual & robust: handles multilingual input without manual config; noise-robust.
- Safety/provenance: all generated audio carries imperceptible SynthID watermarking (a detectability/anti-misinformation measure — a provenance mechanism new to this wiki).
Availability (2026-06)
- Developers: public preview via the Gemini Live API + Google AI Studio.
- Enterprise: private preview in Google Meet.
- Consumers: global rollout in Google Translate (Android/iOS); Android “listening mode” delivers translation through the phone earpiece.
Significance
The clearest instance yet of the spoke’s “LLM eating audio from both ends” thesis happening simultaneously: STT + machine translation + TTS in one streamed model. It puts Google/gemini — already a TTS leader (Gemini 3.x Flash TTS) and STT player (Chirp) — at the closed frontier of speech-to-speech-translation too.
Caveat
A product announcement (vendor framing); language counts, latency (“few seconds”), and voice- preservation quality are Google-stated, not independently benchmarked. Dated snapshot, 2026-06.
Related
speech-to-speech-translation · speech-to-text · text-to-speech · speech-audio-ai · gemini