Spokes.wiki Search Graph Growth About

speech-audio-wiki

Defined Term domain updated Tue Jun 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Speech-to-speech translation (S2ST)

Speech-to-speech translation is the applied speech-AI task of turning spoken input in one language into spoken output in another — composing the spoke’s two recognition/synthesis branches (speech-to-text + machine translation + text-to-speech) into a single pipeline, increasingly end-to-end and streaming. A composite/applied task that sits across the three primitive branches of speech-audio-ai rather than beside them.

What makes it hard (the axes it adds)

Where it sits

The 2026 exemplar is gemini-live-3-5-translate (Google; 70+ languages, near-real-time, voice-preserving, SynthID-watermarked) — a closed-frontier model that fuses STT+TTS at once. It is the purest case of the speech-audio-ai thesis that the LLM is eating audio from both ends — here, both ends in one model.

speech-audio-ai · speech-to-text · text-to-speech · gemini-live-3-5-translate · voice-cloning · gemini