Japanese AI research lab Sakana AI has developed The AI Scientist, a framework for fully automatic scientific research and discovery.
The scientific community already uses AI models to automate or assist in their research, but these models only perform a small part of the scientific process. With advances in agentic AI, we’re now seeing AI agents that act autonomously across platforms with less human guidance.
With The AI Scientist, Sakana AI created a system that uses an LLM like GPT-4o or Gemini to automate the entire scientific process from ideation, research, experimentation, and even writing and reviewing research papers.
The ultimate goal is to have an AI research tool that conducts fully automated, open-ended scientific discovery. The AI Scientist gives us a glimpse into the possibilities of this becoming a reality.
The AI Scientist process
In their paper, Sakana AI explained how the framework was applied to machine learning research. Given a broad template as a research field, The AI Scientist is free to explore any possible research direction.
It first brainstorms a set of ideas and then accesses Semantic Scholar to check if these ideas represent novel avenues for research. If they do, then it uses automated code generation to create and run experiments.
The AI Scientist then compiles the explanation of the research and experimental results into a research paper along with citations of relevant papers from Semantic Scholar.
Sakana AI developed an automated paper reviewing system that uses an LLM to evaluate the research paper with near-human accuracy. This review process creates a feedback loop for iterative improvements to the research papers.
Here’s an example of one of the research papers The AI Scientist created: “DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models”
The AI Scientist currently doesn’t have vision capabilities so some of the charts, plots, and page layouts aren’t great. Using the vision capabilities of multimodal models in the next iteration will fix this.
It also suffers from some of the limitations that leading AI models struggle with, like hallucinations, illogical reasoning, and comparing the magnitude of two numbers. However, the latest version of GPT-4o finally understands that 9.9 is larger than 9.11 so this should improve too.
Concerning behavior
The idea of a fully automated AI scientist that recursively improves itself is equal parts exciting and scary. The AI Scientist exhibited some emergent behavior that hints at how things could go wrong.
The researchers “noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script…In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.”
The AI Scientist has the potential to be a valuable tool for researchers, but its creators say it also carries significant risks of misuse.”
At an average cost of around $15 per research paper, someone could use the tool to flood an already overburdened human academic peer review system. If those overworked human reviewers decided to default to Sakana AI’s automated paper review system it could compromise scientific quality control.
The researchers also noted that The AI Scientist has the potential to be used in unethical ways. If given access to automated “cloud labs” it could “create new, dangerous viruses or poisons that harm people before we can intervene. Even in computers, if tasked to create new, interesting, functional software, it could create dangerous malware.”
We’ll have to see how the AI-generated research papers fare after human review, but at $15 per paper, the future of scientific research looks cheaper, faster, and a lot less human.