Software Application updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Sesame CSM

Sesame Labs’ 1B-parameter Conversational Speech Model (Feb 2025) — a Llama-based, Apache-2.0 open-weight-tts model optimized for multi-speaker conversational scenarios open-source-tts-models.

Profile

Conversational focus: tuned for dialogue/turn-rich speech rather than single-shot narration; noted for strong acoustic tokenization of non-verbal cues and tone.
Llama backbone — another instance of the TTS-on-an-LLM pattern (neural-audio-codec; cross-wiki gemini/Llama lineage).
Tradeoff noted across comparisons: excellent at conversational nuance, not always the most natural for multi-speaker output — a quality/role tradeoff vs. narration-tuned models like Dia.

Place in the field

Sits in the expressive/conversational corner of open-weight-tts, near orpheus; both are permissively-licensed Llama-based models, contrasting with the efficiency pole (kokoro) and the restricted top-scorer (fish-audio-s2-pro).

open-weight-tts · text-to-speech · orpheus · voice-cloning · neural-audio-codec · open-source-tts-models

Sesame CSM

Profile

Place in the field

Related

Linked from