Open-weight TTS
The segment of text-to-speech models whose weights are downloadable and self-hostable — the speech analog of llm-providers-wiki’s open-weight-models. Defined by two facts in 2026:
1. It trails the closed frontier on Elo — but not by much
No open-weight model cracks the tts-arena-leaderboard top tier (all closed/API-only). The open leader is fish-audio-s2-pro (Elo ~1123–1128, 5B) — capable but below Google/Cartesia/Inworld. The same proprietary-premium-vs-open-wedge dynamic llm-providers-wiki tracks for text holds for voice.
2. License is a first-class axis, not a footnote
The open field splits on terms:
- Permissive (Apache-2.0 / MIT): kokoro, orpheus, sesame-csm, Chatterbox, Dia, Higgs Audio V2 — freely commercial-usable open-source-tts-models.
- Research-only / paid-commercial: fish-audio-s2-pro (the highest-Elo open model is not freely commercial) and misotts‘s “modified MIT.” So the practical “best open model” depends on whether you can use it commercially, not just its score.
The shape of the field
Efficiency-first (kokoro 82M, no voice-cloning) → all-rounder (Chatterbox 0.5B) → expressive/streaming (orpheus) → conversational (sesame-csm) → emotive (misotts) → top-quality but restricted (fish-audio-s2-pro). Many are Llama-based — a bridge to the text-LLM market (gemini cross-wiki).
Related
text-to-speech · tts-benchmarks · tts-arena-leaderboard · kokoro · fish-audio-s2-pro · open-source-tts-models