Meta has announced the launch of MusicGen, its most recent development in the areas of music and artificial intelligence (AI).
This open-source AI model was created to generate music using text, providing a distinctive and imaginative method of music production.
A recently published research paper describes how MusicGen has approached music creation in a unique manner. It anticipates the next portion of music instead of the next character in a sentence, creating logical and organized musical creations.
Using Meta’s EnCodec audio tokenizer, MusicGen decomposes audio data as part of the training process.
Similar to Google’s MusicLM, MusicGen uses 20,000 hours of licensed music from Shutterstock, Pond5, and a huge library of high-quality tracks from internal sources, MusicGen has a wide variety of musical genres and compositions at its disposal.
MusicGen can respond to both text and music instructions. It can produce new musical compositions that represent a certain style by fusing the melody from an audio file with a text prompt that describes that style.
The capacity to hear melodies in various genres or exact control over the orientation to the tune is not provided by MusicGen – instead, MusicGen gives a creative interpretation.
Model sizes ranging from 300 million to 3.3 billion parameters were tested by the researchers. Although the 1.5 billion parameter model obtained the best marks from human raters, larger models often produced audio of greater quality. The accuracy of the 3.3 billion parameter model’s text-to-audio conversion was exceptional.
MusicGen surpasses competing music models like Riffusion, Mousai, MusicLM, and Noise2Music in both objective and arbitrary measures.
It shows excellence in assessing the compatibility of the music and words as well as the composition’s credibility.
Discover music samples here as well as comparisons between MusicGen and competitors like Google’s MusicLM.
Meta has made the code and models available as open source on Github, with commercial usage permissible. Huggingface has a demonstration accessible.