Spokes.wiki Search Graph Growth About

speech-audio-wiki

Defined Term concept updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Open-weight TTS

The segment of text-to-speech models whose weights are downloadable and self-hostable — the speech analog of llm-providers-wiki’s open-weight-models. Defined by two facts in 2026:

1. It trails the closed frontier on Elo — but not by much

No open-weight model cracks the tts-arena-leaderboard top tier (all closed/API-only). The open leader is fish-audio-s2-pro (Elo ~1123–1128, 5B) — capable but below Google/Cartesia/Inworld. The same proprietary-premium-vs-open-wedge dynamic llm-providers-wiki tracks for text holds for voice.

2. License is a first-class axis, not a footnote

The open field splits on terms:

The shape of the field

Efficiency-first (kokoro 82M, no voice-cloning) → all-rounder (Chatterbox 0.5B) → expressive/streaming (orpheus) → conversational (sesame-csm) → emotive (misotts) → top-quality but restricted (fish-audio-s2-pro). Many are Llama-based — a bridge to the text-LLM market (gemini cross-wiki).

text-to-speech · tts-benchmarks · tts-arena-leaderboard · kokoro · fish-audio-s2-pro · open-source-tts-models