OpenAI has gone fairly quiet once again, with GPT-4o’s much-hyped voice chat features rolling out far more slowly than anyone had anticipated.
But there have been murmurings about new projects in the works, including SearchGPT, which combines generative AI and web browsing and the more mysterious “Project Strawberry.”
Strawberry’s origins extend back to November 2023, when a model (more so a training technique) named Q* surfaced in leaks from Reuters.
It was even speculated that Q* was potentially dangerous and played some role in CEO Sam Altman’s hiring and firing last year.
Q* was thought to combine an advanced reasoning model with an AI agent capable of exploring the internet.
Despite dramatic headlines, ‘OpenAI is sitting on an apocalyptically powerful model,’ its legitimacy was very much contested at the time.
More details of the Q* project emerged in May and June this year, which saw it renamed to Project Strawberry or just Strawberry. According to Reuters, Strawberry involves a specialized method of training AI models to explore the internet autonomously and conduct ‘deep research.’
The Q likely refers to Q-learning, a long-established reinforcement learning (RL) technique. As for the star (*), there’s more uncertainty. Reuters said it’s similar to a method developed at Stanford called “Self-Taught Reasoner” or “STaR.” Others say it relates to a search algorithm named A*.
Sources mentioned that OpenAI wants the model to conduct research by autonomously browsing the web, assisted by a “computer-using agent” (CUA) – which is also a key component of SearchGPT.
According to those sources, OpenAI wants Strawberry to perform “long-horizon tasks” (LHT), which involve complex planning and execution over extended periods.
Stanford professor Noah Goodman, one of STaR’s creators, told Reuters about the tech, “I think that is both exciting and terrifying…if things keep going in that direction we have some serious things to think about as humans.”
When asked about Strawberry, an OpenAI spokesperson provided a general statement about the company’s AI development goals:
“We want our AI models to see and understand the world more like we do. Continuous research into new AI capabilities is a common practice in the industry, with a shared belief that these systems will improve in reasoning over time.”
Social media stirs the pot
Not long after the Reuters report, in early August, Altman posted a photo of strawberries accompanied by the caption “i love summer in the garden,” reigniting speculation about the Strawberry project.
i love summer in the garden pic.twitter.com/Ter5Z5nFMc
— Sam Altman (@sama) August 7, 2024
Then, the user iruletheworldmo, a kind of AI-focused meme/satire account (with a profile photo of Theodore Twombly, played by Joaquin Phoenix, from the AI-themed film Her, which has become associated with Altman), began posting strawberry-related content, hinting at a potential ‘level two’ breakthrough in AI.
The user posted: “welcome to level two. how do you feel? did I make you feel?” Altman, CEO of OpenAI, responded with “amazing tbh”.
This exchange set off a chain reaction of strawberry-themed posts and mass speculation across X and Reddit.
welcome to level two.
how do you feel?
did I make you feel?
— 🍓🍓🍓 (@iruletheworldmo) August 7, 2024
Strawberry takes another turn
Just recently, The Information revealed that OpenAI is gearing up to launch a version of Strawberry as part of a chatbot and possibly integrate it into ChatGPT as soon as this fall.
OpenAI also allegedly demonstrated Strawberry’s capabilities to US national security officials.
Interestingly, according to The Information, OpenAI is developing two distinct versions of Strawberry:
- This smaller, simplified version is intended for integration into chat-based applications like ChatGPT. It aims to enhance reasoning capabilities in scenarios where users require more thoughtful, detailed answers rather than quick responses.
- This larger, more powerful version is used to generate high-quality “synthetic” training data for OpenAI’s next flagship language model, codenamed “Orion.”
Synthetic data generated by Strawberry could reduce reliance on internet-scraped text and images for training.
That could potentially lead to more accurate and reliable AI models, addressing persistent issues like AI “hallucinations” or model collapse.
Strangely, though, these characterizations of Strawberry don’t align that well with the earlier descriptions of Q*.
Perhaps we could speculate that Strawberry, the autonomous agent, surfs the web autonomously and uses its ‘deep research’ to ultimately synthesize data.
Maybe that’s more computationally efficient and useful for model training than simply scraping the raw data itself??
AI doesn’t know how many R’s are in strawberry
Now, here’s where the story takes a bizarre and ironic twist.
Strawberry might be named after a word that current AI models, including some of the most advanced ones, often struggle to spell correctly.
Ask an AI how many ‘r’s are in “strawberry,” and there’s a chance it’ll confidently answer “two” instead of the correct “three.”
— Rob DenBleyker (@RobDenBleyker) August 26, 2024
Sounds ridiculous, right? I didn’t believe it myself until I tried it with Claude.
When this first came to light, some alleged that this was some sort of ‘easter egg’ or joke within OpenAI’s systems.
But seeing as Claude reacts the same as ChatGPT, then unless AI companies are colluding on niche strawberry jokes behind the scenes, that seems unlikely.
The explanation behind this is elegant in its simplicity.
Language models, despite the name, are math-based systems. They don’t ‘truly’ understand words. Text is translated into code, thus risking the loss of context and meaning at the word level.
Why strawberry reliably triggers this shortcoming is the more mystifying question.
In any case, whether OpenAI chose the name “Strawberry” as a playful nod to this common AI stumbling block or pure coincidence remains unclear. It seems like something Altman might do, whether Strawberry is real or not.
What’s next in this bizarre but berry interesting (…) strawberry story is anyone’s guess. To be honest, I get the sense, at this stage, that none of the speculatory ‘evidence’ we have from major news outlets is wholly representative of what’s going on at OpenAI.
We’ll have to wait for SearchGPT and/or GPT-5 to see just how evolved OpenAI’s products become off the back of Strawberry and their other projects.