Stability AI announced the release of its AI audio generator and claims it delivers first-in-class performance.
Over the last few months, we’ve seen a number of AI audio generators with varying performance, but Stable Audio seems to have raised the bar.
With Stable Audio you can enter a text prompt and it will generate music or audio based on the prompt. We’ve seen that kind of functionality from Google with its MusicLM and Meta’s AudioCraft products.
MusicLM is still only available in Google’s Test Kitchen and generates music at 24 kHz. Meta’s AudioCraft is really impressive, but it only generates music at 32 kHz.
Stable Audio is the first deployed text-to-audio generator that outputs audio at 44.1 kHz, which is the sampling rate of “CD quality” music.
The other impressive feature of Stable Audio is the length of tracks it produces. Most AI audio generators produce shorter pieces of music that quickly repeat or lose their way. Stable Audio produces more nuanced music of around 90 seconds without losing coherence.
Today we’re thrilled to launch Stable Audio, our first AI product for music and sound generation!
Try it out here for free! #stabilityAI #stableaudio #newannouncement
https://t.co/pRK3Qs9Fak pic.twitter.com/cZfbK1mZYA— Stability AI (@StabilityAI) September 13, 2023
You can check out some samples of the generated audio here.
Here’s an example of a track I was able to generate using the tool with the following prompt:
“Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM”
That sounds pretty good actually.
The model relies on Stability’s latent diffusion technique that it uses for its other generative AI products. It uses some clever codecs during inference that it says allows for 95 seconds of 44.1 kHz music to be generated in 1 second using a Nvidia A100 GPU.
Training and Copyright questions
Stability AI produced Stable Audio in cooperation with Harmonai, a deep learning research lab focused on creating open-source generative audio models. Stability AI’s audio team created a new model based on their earlier Dance Diffusion model, which HarmonAI trained.
The dataset used to train Stable Audio came from AudioSparx which supplied around 800,000 songs from the independent music artists it represents.
The artists were given the option to choose to have their work excluded from the dataset which around 10% reportedly did.
Artists that opted to have their work included in the dataset will share in the profit-sharing arrangement that Stability AI and AudioSparx have entered into.
There aren’t any big-label artists in the dataset but Stable Audio doesn’t stop you from adding an artist or band name into your prompt.
While the AudioSparx library doesn’t contain work by a band like AC/DC for example, it does contain plenty of music described as being in the style of AC/DC.
You still can’t copyright the music you generate with an AI tool. And the terms of use say that you “are responsible for ensuring the lawfulness of all Content” made using Stable Audio.
The terms further state that “you represent and warrant that you own all necessary right, title, and interest in and to such prompts, including, without limitation, all necessary copyrights and rights of publicity contained therein.”
So maybe don’t add ‘Metallica’ to your prompt. Those guys make great music, but they love a good copyright lawsuit too.
How much does Stable Audio cost?
You can try Stable Audio out for free but you’ll be limited to generating 20 non-commercializable tracks per month, each limited to a length of 20 seconds. And their servers are absolutely slammed at the moment so it takes a while to generate a track.
Thrilled that demand for our Stable Audio launch today has been off the charts! But our servers are now at full capacity, so you may not be able to access the product. If you can’t, we kindly ask that you check back in 24 hours to try again.
In the meantime, we’re working hard…
— Stability AI (@StabilityAI) September 13, 2023
A Pro subscription will run you $11.99 per month which allows you to generate 500 commercializable tracks of up to 90 seconds long each month.
Unused generation credits don’t roll over to the following month so use it or lose it.
If you have an app, website, or software that has more than 100,000 users you need to contact Stability to get pricing on their enterprise plan.
Stability says it will be “open sourcing a music generation model soon, trained on different data.”
With its Stable Audio product maybe Stability AI has finally found a way to make some money for its investors.