Software Application updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Fish Audio S2 Pro

The highest-ranked open-weight text-to-speech model on the tts-arena-leaderboard (Elo ~1123–1128) — from Fish Audio. A 5B-parameter model.

Profile

Architecture: Dual-Autoregressive with an RVQ neural-audio-codec (the same codec-token family as misotts).
Scale: trained on 10M+ hours across 80+ languages; ~3.5% WER / 1.2% CER (English, vendor) tts-models-2026-benchmark.
Supports voice-cloning.

The licensing catch

Its weights are open, but under a research license — commercial use requires a separate paid license. So the best-scoring open model is not freely commercial, which is exactly why open-weight-tts treats license as a first-class axis: teams needing permissive terms drop to Apache-2.0/MIT models (kokoro, orpheus, sesame-csm) and accept lower Elo.

Place in the field

Fish Audio S2 Pro marks the open-weight ceiling in 2026 — close enough to pressure the closed frontier on quality, still a notch below the API leaders (gemini Flash TTS, Cartesia Sonic) and encumbered on commercial use.

open-weight-tts · text-to-speech · tts-arena-leaderboard · neural-audio-codec · voice-cloning · kokoro

Fish Audio S2 Pro

Profile

The licensing catch

Place in the field

Related

Linked from