Kokoro
An 82M-parameter open-weight-tts model from Hexgrad (v1.0, Jan 2025) — the field’s efficiency leader. Apache-2.0 licensed.
Why it matters
- Tiny + cheap: at 82M it runs on minimal hardware for <$1 / 1M characters self-hosted ($0.65/1M on tts-arena-leaderboard) — the budget/on-device default.
- Punches above its size: ~4.5 MOS naturalness and a competitive Elo ~1064 despite being ~60× smaller than fish-audio-s2-pro tts-models-2026-benchmark.
- Architecture: StyleTTS2 + ISTFTNet — no diffusion, which is part of why it’s fast.
The tradeoff
No voice-cloning — it ships ~54 preset voices and ~15 languages. The capability dropped to hit the footprint, making Kokoro the clean example of the efficiency-vs-controllability split in open-weight-tts. CER ~17% (Trelis) is higher than the heavyweight models — quality-per-byte is its pitch, not absolute accuracy.
Related
open-weight-tts · text-to-speech · tts-benchmarks · voice-cloning · fish-audio-s2-pro · open-source-tts-models