OpenAI unveils Realtime API and other features for developers

October 2, 2024

  • OpenAI announced several new features for developers who use its API
  • The Realtime API will power low-latency apps and voice interactions similar to Advanced Voice Mode
  • Other features enable cheaper API calls, vision fine-tuning, and tuning smaller models with larger ones

OpenAI didn’t release any new models at its Dev Day event but new API features will excite developers who want to use their models to build powerful apps.

OpenAI has had a tough few weeks with its CTO, Mira Murati, and other head researchers joining the ever-growing list of former employees. The company is under increasing pressure from other flagship models, including open-source models which offer developers cheaper and highly capable options.

The new features OpenAI unveiled were the Realtime API (in beta), vision fine-tuning, and efficiency-boosting tools like prompt caching and model distillation.

Realtime API

The Realtime API is the most exciting new feature, albeit in beta. It enables developers to build low-latency, speech-to-speech experiences in their apps without using separate models for speech recognition and text-to-speech conversion.

With this API, developers can now create apps that allow for real-time conversations with AI, such as voice assistants or language learning tools, all through a single API call. It’s not quite the seamless experience that GPT-4o’s Advanced Voice Mode offers, but it’s close.

It’s not cheap though, at approximately $0.06 per minute of audio input and $0.24 per minute of audio output.

Vision fine-tuning

Vision fine-tuning within the API allows developers to enhance their models’ ability to understand and interact with images. By fine-tuning GPT-4o using images, developers can create applications that excel in tasks like visual search or object detection.

This feature is already being leveraged by companies like Grab, which improved the accuracy of its mapping service by fine-tuning the model to recognize traffic signs from street-level images​.

OpenAI also gave an example of how GPT-4o could generate additional content for a website after being fine-tuned to stylistically match the site’s existing content.

Prompt caching

To improve cost efficiency, OpenAI introduced prompt caching, a tool that reduces the cost and latency of frequently used API calls. By reusing recently processed inputs, developers can cut costs by 50% and reduce response times. This feature is especially useful for applications requiring long conversations or repeated context, like chatbots and customer service tools.

Using cached inputs could save up to 50% on input token costs.

Price comparison of cached and uncached input tokens for OpenAI’s API. Source: OpenAI

Model distillation

Model distillation allows developers to fine-tune smaller, more cost-efficient models, using the outputs of larger, more capable models. This is a game-changer because, previously, distillation required multiple disconnected steps and tools, making it a time-consuming and error-prone process.

Before OpenAI’s integrated Model Distillation feature, developers had to manually orchestrate different parts of the process, like generating data from larger models, preparing fine-tuning datasets, and measuring performance with various tools.

Developers can now automatically store output pairs from larger models like GPT-4o and use those pairs to fine-tune smaller models like GPT-4o-mini. The whole process of dataset creation, fine-tuning, and evaluation can be done in a more structured, automated, and efficient way.

The streamlined developer process, lower latency, and reduced costs will make OpenAI’s GPT-4o model an attractive prospect for developers looking to deploy powerful apps quickly. It will be interesting to see which applications the multi-modal features make possible.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions