Berkeley researchers build AI forecasting system that bests humans

April 3, 2024

  • Researchers from the University of Berkeley built an AI forecasting system with GPT-4
  • The system searches and parses information from articles to create predictions for the future
  • It equals or bests humans in certain scenarios, e.g., when event data is limited
AI forecasting

Researchers at the University of Berkeley, California, developed an AI forecasting system to predict future events with similar accuracy to human crowd wisdom. 

As LLMs are not purpose-built for event forecasting, the team built a forecasting system on top of GPT-4 using a novel approach called retrieval-augmented reasoning.

This multi-step process involved training GPT-4 to search for pertinent information, assess its relevance, and integrate it into its reasoning process before making a prediction. 

Here’s how it works:

  1. Retrieval: The AI system uses GPT-4 to generate search queries based on the forecasting question and sub-questions, retrieving a broad set of potentially relevant news articles.
  2. Relevance evaluation: GPT-4 evaluates the relevance of each retrieved article, discarding low-scoring articles to narrow down the information pool.
  3. Summarization: GPT-4 distills each article down to its key points, focusing on details related to the forecasting question.
  4. Reasoning: Using “scratchpad prompts,” GPT-4 analyzes the summarized articles and produces a detailed forecast with an explanatory rationale. These prompts guide the model’s thought process, encouraging a systematic approach to reasoning.

The Berkeley team then took the system a step further with self-supervised fine-tuning. 

They generated a large number of AI forecasts on past questions with known answers and selected examples where the AI had outperformed the “wisdom of the crowd” – defined as the aggregated predictions of human forecasters.

By fine-tuning GPT-4 on these examples, the researchers taught the model to emulate reasoning patterns that created the best forecasts.


When tested on forecasting questions from June 2023 onward, the AI achieved a Brier score of 0.179, compared to the human forecaster score of 0.149. 

The AI performed particularly well on questions with high human uncertainty early in the forecasting process and when it had access to sufficient relevant articles on a particular topic. 

AI forecasting
(a) The system performs better than a group of people when it has 0 to 10 relevant articles.
(b) When people are unsure about their predictions (confidence levels between 0.3 and 0.7), the system does better, with a Brier score of 0.199 compared to their 0.246. However, when people are very sure (predictions under 0.05), they do better than our system.
(c) The system’s accuracy is higher at the beginning of information gathering. Source: ArXiv (open access).

The authors write in the study, “To our knowledge, this is the first automated system with forecasting capability that nears the human crowd level, which is generally stronger than individual human forecasters.”

There was one slight quirk, as the system seemed to worsen with more articles to work from and, thus, higher certainty about the forecast. This might be because the model ‘hedges’ its predictions.

Researchers describe it as follows: “We hypothesize that this stems from our model’s tendency to hedge predictions due to its safety training.”


According to researchers, policymakers, businesses, and public health officials could all benefit from this form of language-driven AI forecasting.

“In the future, political decision-makers may consult the AIs on what actions would most likely bring about desired outcomes,” states Dan Hendrycks from the Center for AI Safety in California.

He proposes that prediction-making models could tackle forthcoming hazards posed by AI. “Forecasting bots would aid us in anticipating and avoiding these risks,” Hendrycks told the New Scientist.

Other attempts have been made to predict complex life events with AI, including a model trained by Danish researchers to predict the risks of premature death

Harnessing AI for predictive applications that affect people’s lives poses ethical questions, such as ensuring these systems are transparent, unbiased, and ethically grounded.

This new Berkeley study outlines how AI can make effective forecasts, but we can’t gauge how precisely it arrives at its decisions.

The use of AI to predict major societal and individual events may seem like a dystopian concept, but it’s already a widespread practice in many parts of the world.

In several democratic countries, including the US, UK, Brazil, Australia, and the Netherlands, AI is used for policing, surveillance, and welfare decision-making

Might an AI be predicting aspects of your future right now? It’s certainly possible.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions