MusicGen (Meta AudioCraft)
The open research reference for audio-music-generation the branch lacked: Meta’s text-and-melody-to-music model, released Aug 2023 inside the AudioCraft family (MusicGen for music, AudioGen for sound effects, EnCodec for the codec). It’s the canonical open foil to the closed leaders suno/udio, distinct from the also-open stable-audio.
Architecture & sizes
- A single-stage transformer LM generating over EnCodec RVQ audio tokens — i.e. “next-token over audio,” the exact LLM-stack convergence the synthesis tracks (the music-branch instance of TTS’s Llama+RVQ pattern).
- Melody conditioning: can be steered by a hummed/extant melody plus a text prompt.
- Sizes: 300M / 1.5B / 3.3B parameters. Trained on ~20,000 hours (400k recordings) owned or licensed by Meta — a deliberately clean-rights training set.
The license twist (sharpens the rights thesis)
MusicGen splits its licensing: code is MIT, but model weights are CC-BY-NC 4.0 (non-commercial). So it is open-weight but not commercially usable — the music-branch echo of TTS’s research-license ceiling (fish-audio-s2-pro): the open option you can study and self-host but not ship in a product. This is exactly the synthesis’s point that license, not just score, decides — and that “open” is graded. It contrasts with stable-audio‘s more permissive open release and with the clean-training-data posture that keeps it clear of the ai-music-copyright litigation engulfing suno.
Where it sits
MusicGen is the architecture anchor of the music branch: the open, inspectable model that shows music generation is the same neural-codec-token LM as the rest of speech-audio-ai, and the reference point against which closed quality (suno Elo ~1293) and the legal/rights dimension are measured.
Related
audio-music-generation · neural-audio-codec · suno · udio · stable-audio · ai-music-copyright · speech-audio-ai