Meta unveils Voicebox, a cutting-edge audio-centric AI

June 17, 2023

AI Meta

Meta has unveiled Voicebox, a state-of-the-art generative AI model for speech. It works similarly to text generators, like ChatGPT, but generates audio instead of text responses. 

Voicebox can generate audio from scratch or modify existing audio. It’s a highly flexible tool that can take a 2-second audio clip of someone’s voice and use that to generate speech in a different language while retaining voice intonation.

This combines with text-to-speech generation. So, you can ‘insert’ your voice into the AI and use it for text-to-speech generation with your own voice.  For example, if you’re on holiday and need to communicate in English, French, Spanish, German, Polish, or Portuguese, simply type your message into Voicebox, and it’ll speak for you.

The model was trained with over 50,000 hours of recorded speech and transcripts in 6 languages: English, French, Spanish, German, Polish, and Portuguese. It’s considerably faster and more accurate than similar audio-centric AIs, like VALL-E.  

Here are Voicebox’s 4 main uses:

  1. In-context text-to-speech synthesis: Voicebox can generate realistic audio from text. This could be used to create multilingual virtual assistants to enable people with voice and hearing conditions to converse more naturally. 
  2. Cross-lingual style transfer: The AI can translate text into 6 different languages, enabling authentic and natural multilingual communication.
  3. Speech denoising and editing: Voicebox can generate speech to edit segments within audio recordings. For example, it can resynthesize parts of speech corrupted by noise. 
  4. Diverse speech sampling: Voicebox can generate representative speech across 6 languages which is ideal for generating synthetic data for other speech and audio models with impressive results. Speech recognition models trained on Voicebox-generated synthetic speech perform near-equally with models trained on real speech, with a marginal 1% error rate degradation, a massive leap from the 45 to 70% degradation observed in similar models. 

Meta hasn’t released Voicebox yet, citing concerns about misuse. However, they have published an in-depth paper about the model, available here

While there’s no official estimate on when people will be able to use Voicebox, Meta says the tool will help creators edit audio tracks, improve communication with visually impaired people, and enable people to speak any foreign language in their own voice.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions