Defined Term domain updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Audio & music generation

Generating music and audio from text (or reference audio). The generation branch of speech-audio-ai, distinct from speech synthesis (text-to-speech) in producing songs, instrumentals, and sound design rather than spoken words.

The field (2026)

suno — the quality leader (v5, Elo ~1293); best vocal songs from one prompt; $2.45B valuation; the commercial flagship.
udio — closest competitor; strong stems; the licensing-clean option post-UMG settlement.
stable-audio — Stability AI; open-weight, instrumental, sound-design focus; the open wedge (stable-audio-3).
ElevenLabs Music (multilingual, modular regen), AIVA (cinematic/score, full IP ownership).

What makes this branch different

Copyright is the dominant axis, not just quality — see ai-music-copyright. Whether you can use the output legally varies more than how good it sounds (Suno litigation vs Udio settlement vs Stable Audio’s licensed data).
Vocals vs instrumental split — the song generators (suno, udio, ElevenLabs) do vocals; the open/instrumental tools (stable-audio, AIVA) don’t.
A growing but contested market — ~$0.57B (2024) → ~$1.98B (2026), yet AI tracks show 25–40% lower save / 15–25% higher skip rates than human recordings ai-music-generators-2026 — demo tool, not finished-product (yet).

Shared with the rest of the wiki

Same open-vs-closed structure (stable-audio is the open wedge, as fish-audio-s2-pro is for TTS) and Elo-style ranking (tts-benchmarks) — but copyright, not license-terms alone, is the sharper constraint here.

speech-audio-ai · ai-music-copyright · suno · udio · stable-audio · ai-music-generators-2026

Audio & music generation

The field (2026)

What makes this branch different

Shared with the rest of the wiki

Related

Linked from