Organization source ↗ source url updated Wed Jun 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

ElevenLabs

The leading commercial voice-AI company — referenced across the spoke (Scribe STT WER, TTS Elo) but never paged. The clearest single embodiment of the synthesis’s closed-frontier pole, and unusual in that one proprietary provider spans all three branches of speech-audio-ai: TTS, STT, and now music. Founded 2022 by Piotr Dąbkowski (ex-Google ML) and Mati Staniszewski (ex-Palantir), reportedly motivated by bad film dubbing.

Products — across all three branches

text-to-speech — context-aware, emotive synthesis; a perennial leader on the Elo board.
voice-cloning — “Voice Design,” custom voices from samples (and the capability whose misuse drives the audio-deepfake axis).
AI dubbing — speech translated into 20+ languages while preserving the source voice (a speech-to-speech-translation-adjacent product).
Scribe — its STT model with “industry-leading word error rate” per third-party tests (the ~3.3% EN figure the stt-apis-comparison thread cites).
Eleven Music (Aug 2025) — its entry into audio-music-generation.
Conversational AI / Expressive Mode (2026) — real-time voice agents fusing Eleven V3 Conversational TTS + Scribe v2 Realtime STT, with emotion inferred from prosody (“tone cue cards”), 70+ languages — bundling its branches into one low-latency dialogue loop (a composite product).

Why it matters

ElevenLabs is the commercial archetype the open wedge competes against in every branch: proprietary, polished, premium, and closed — the foil to kokoro/fish-audio-s2-pro (TTS), whisper/canary-qwen (STT), and stable-audio/musicgen (music). Its trajectory also marks how commercially central voice AI has become: $11B valuation (Series D, Feb 2026), up from a $100M seed-stage in 2023; 1M+ users by mid-2023; Forbes AI 50. On the rights axis the synthesis tracks, it sits on both sides — its cloning powers the deepfake-fraud risk, and it ships an AI Speech Classifier to detect AI-generated audio (the SynthID/ASVspoof detection thread of audio-deepfake).

text-to-speech · speech-to-text · voice-cloning · audio-deepfake · tts-arena-leaderboard · stt-apis-comparison · audio-music-generation · elevenlabs-expressive-mode

ElevenLabs

Products — across all three branches

Why it matters

Related

Linked from