OpenAI has unveiled voice and image features for ChatGPT, set to roll out over the coming weeks for both the app and browser.
It’s fair to say that OpenAI has rested on its laurels with ChatGPT, which doesn’t incorporate quite the same level of functionality as competitors Claude from Anthropic and Bard from Google.
OpenAI added a browser search function for ChatGPT earlier in the year, granting the tool access to the internet, but it didn’t work particularly well and was removed for potentially violating copyrights by ‘printing’ text from paywalled websites.
With that said, GPT-4 is by far the most complex large language model (LLM) out there, which has kept OpenAI at the top of the generative AI pecking order.
OpenAI has now boosted the chatbot’s functionality while keeping ChatGPT firmly in the limelight as industry competition heats up.
What’s new?
OpenAI is adding the following to ChatGPT:
- Voice interaction: Users can now speak directly to ChatGPT, and in return, the AI can respond audibly using one of its five synthesized voices. This voice feature is underpinned by an advanced text-to-speech model that OpenAI trained using samples from voice actors. ChatGPT leverages Whisper, OpenAI’s open-source speech system for recognizing speech.
- Image interaction: Beyond voice, users can now provide ChatGPT with images, adding a visual dimension to the conversation. For instance, if a user shares a photo of a broken appliance, ChatGPT could potentially diagnose the issue and suggest solutions. On mobile platforms, a drawing tool has been integrated, allowing users to circle or pinpoint specific areas of an image for the AI to focus on. The image capabilities are driven by a multimodal version of the GPT-3.5 and GPT-4 models, which have been fine-tuned to interpret and reason about visual inputs.
With these new additions, users can have a back-and-forth conversation with that chatbot and ask it for specific information about image content, among other things.
There’s no doubt that the community will find interesting ways of testing the new ChatGPT’s limits.
OpenAI posted the following promotional demo on X:
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.
Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023
Risks and rollout plan
With new features come new risks. For example, the voice technology could be misused for impersonating public figures. As a precaution, OpenAI has restricted the voice feature to conversational chat only.
Regarding images, OpenAI has deliberately limited ChatGPT’s ability to analyze people in photos directly.
OpenAI is planning a phased rollout, with ChatGPT Plus and Enterprise users being the first to receive access.
The voice feature will be available on mobile apps, while the image functions will be accessible across all platforms.
OpenAI’s announcement enters a medley of recent and imminent generative AI product releases, including tools from YouTube, Microsoft’s Copilot suite of AI tools and assistants, and a significant update of Google Bard.