Sesame CSM
Sesame Labs’ 1B-parameter Conversational Speech Model (Feb 2025) — a Llama-based, Apache-2.0 open-weight-tts model optimized for multi-speaker conversational scenarios open-source-tts-models.
Profile
- Conversational focus: tuned for dialogue/turn-rich speech rather than single-shot narration; noted for strong acoustic tokenization of non-verbal cues and tone.
- Llama backbone — another instance of the TTS-on-an-LLM pattern (neural-audio-codec; cross-wiki gemini/Llama lineage).
- Tradeoff noted across comparisons: excellent at conversational nuance, not always the most natural for multi-speaker output — a quality/role tradeoff vs. narration-tuned models like Dia.
Place in the field
Sits in the expressive/conversational corner of open-weight-tts, near orpheus; both are permissively-licensed Llama-based models, contrasting with the efficiency pole (kokoro) and the restricted top-scorer (fish-audio-s2-pro).
Related
open-weight-tts · text-to-speech · orpheus · voice-cloning · neural-audio-codec · open-source-tts-models