Spokes.wiki Search Graph Growth About

speech-audio-wiki

Software Application source ↗ source url updated Tue Jun 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Gemini 3.5 Live Translate

Google’s end-to-end speech-to-speech translation audio model (announced 2026-06), part of the gemini family (llm-providers-wiki, cross-wiki). It listens to spoken input, translates, and generates natural speech in the target language — not text translation. A proprietary, closed-frontier entrant that fuses the spoke’s STT and TTS branches into one real-time pipeline.

Capabilities

Availability (2026-06)

Significance

The clearest instance yet of the spoke’s “LLM eating audio from both ends” thesis happening simultaneously: STT + machine translation + TTS in one streamed model. It puts Google/gemini — already a TTS leader (Gemini 3.x Flash TTS) and STT player (Chirp) — at the closed frontier of speech-to-speech-translation too.

Caveat

A product announcement (vendor framing); language counts, latency (“few seconds”), and voice- preservation quality are Google-stated, not independently benchmarked. Dated snapshot, 2026-06.

speech-to-speech-translation · speech-to-text · text-to-speech · speech-audio-ai · gemini