Everything you need to know about OpenAI’s new flagship model, GPT-4o

May 13, 2024

  • OpenAI announced its new flagship multi-modal model called GPT-4o
  • The O stands for "omni," denoting this model's excellent audio-visual performance
  • GPT-4o can perform seriously impressive real-time speech translation

OpenAI just demoed its new flagship foundational model, GPT-4o, with incredible speech recognition and translation skills. 

As CEO Sam Altman himself stated, we knew OpenAI‘s latest “Spring update” was unrelated to GPT-5 or AI search.

But at 10 a.m. PT today, hundreds of thousands joined the live-streamed presentation of the new model as Chief Technology Officer (CTO) Mira Murati demonstrated its benefits over its predecessor, GPT-4.

Key announcements from the demo session include:

  • GPT-4o (the o stands for omni) intends to supersede GPT-4, with OpenAI calling it its new flagship foundational model. 
  • While broadly similar to GPT-4, GPT-4o offers superior multilingual and audiovisual processing. It can process and translate audio in near real-time. Later tests showed that GPT-4o is worse than GPT-4 on some ‘hard tasks.’
  • OpenAI is making GPT-4o freely available, with limits. Pro users still get priority and a higher message cap.
  • OpenAI is also releasing a desktop version of ChatGPT, initially for Mac only, which is rolling out immediately.
  • Custom GPTs will become accessible to free users, too.
  • GPT-4o and its voice features will roll out slowly over the coming weeks and months.

GPT-4o‘s real-time audio translation

The headline that’s got everyone talking is GPT-4o’s impressive audio processing and translation, which operate in near real-time. 

Demonstrations showed the AI engaged in remarkably natural voice conversations, offering immediate translations, telling stories, and providing coding advice. 

For example, the model can analyze an image of a foreign language menu, translate it, and provide cultural insights and recommendations. 

It can also recognize emotion through breathing, expressions, and other visual cues. 

GPT-4o’s emotional recognition skills will probably attract controversy once the dust settles.

Emotionally cognizant AI might evolve potentially nefarious use cases that rely on human mimicry, such as deep fakes, social engineering, etc. 

Another impressive skill demoed by the team is real-time coding assistance provided via voice.

One demo even saw two instances of the model singing to each other.

The general gist of OpenAI’s demos is that the company aims to make AI multimodality genuinely useful in everyday scenarios, challenging tools like Google Translate in the process. 

Another key point is that these demos are true to life. OpenAI pointed out, “All videos on this page are at 1x real time,” possibly alluding to Google, which heavily edited its Gemini demo video to exaggerate its multi-modal skills.

With GPT-4o, multi-modal AI applications might move from a novelty buried deep inside AI interfaces to something average users can interact with daily.

While the demo was impressive, it’s still a demo, and results from average users “in the wild” will truly reveal how competent these features are.

Aside from real-time voice processing and translation, which is soaking up the limelight, the fact that OpenAI is making this new model free of constraints is massive. 

While GPT-4o is *just* a slightly better GPT-4, it will equip anyone with a top-quality AI model, leveling the playing field for millions worldwide.

You can watch the announcement and demo below:

Everything we know about GPT-4o

Here’s a rundown of everything we know about GPT-4o thus far:

  • Multimodal integration: GPT-4o rapidly processes and generates text, audio, and image data, enabling dynamic interactions across different formats. 
  • Real-time responses: The model boasts impressive response times, comparable to human reaction speeds in conversation, with audio responses starting in as little as 232 milliseconds.
  • Language and coding capabilities: GPT-4o matches the performance of GPT-4 Turbo in English and coding tasks and surpasses it in non-English text processing.
  • Audio-visual improvements: Compared to previous models, GPT-4o shows a superior understanding of vision and audio tasks, enhancing its ability to interact with multimedia content.
  • Natural interactions: Demonstrations included two GPT-4os engaging in a song, helping with interview preparation, playing games like rock paper scissors, and even creating humor with dad jokes.
  • Reduced costs for developers: OpenAI has slashed the cost for developers using GPT-4o by 50% and doubled its processing speed.
  • Benchmark performance: GPT-4o benchmarks excel in multilingual, audio, and visual tasks, though independent tests confirm it’s behind GPT-4 on some coding, math, and other ‘hard tasks.’ 

GPT-4o is a meaningful announcement for OpenAI, particularly as its the most powerful free closed model available by a sizeable margin.

It might signal an era of practical, useful AI multi-modality that people begin to engage with en-masse.

That would be a massive milestone both for the company and the generative AI industry as a whole.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions