A study conducted by the University of Stavanger in Norway reveals that AI large language models (LLMs) outperformed humans in tests designed to measure creative thinking.
Researchers pitted 256 human volunteers against three AI chatbots – ChatGPT3, ChatGPT4, and Copy.Ai (based on GPT-3) – in generating alternative uses for everyday objects such as ropes, boxes, pencils, and candles.
The study measures divergent thinking via the Alternate Uses Task (AUT), developed by psychologist J.P. Guilford in 1967.
The idea is for participants to devise as many uses for simple objects as possible in a set time period. For example, a paper clip could be used as a lock pick or engraving tool.
AIs generally outperformed humans in the task. “Indeed, this is a remarkable type of ability that AI chatbots display,” said Simone Grassini, the study’s author. “The findings show that AI is better than most humans in creative thinking.”
Both human and AI participants followed the same instructions, emphasizing the importance of creative quality over the quantity of ideas.
Chatbots were tested 11 times using four different object prompts. Some adjustments were made to equate the number of ideas generated by chatbots with those from human participants.
Though chatbots scored higher on average in divergent thinking tasks, the research also noted that the most innovative human-generated ideas matched or exceeded those of the AI chatbots.
“Our results show that, at least for now, the best humans still outperform the AI,” Grassini added.
This is revealing – humans can produce real quality but not as quickly as AI, which is what you’d most likely expect.
The study, published in Nature, highlights that AI-generated responses scored higher than human responses in categories like semantic distance and creativity.
However, humans still outshined chatbots in seven out of eight scoring categories regarding the best individual responses.
“I knew that the chatbot would have performed well, but I think it performed even better than what I expected,” Grassini remarked.
Despite the promising results, researchers emphasized that the unique complexity of human creativity may be challenging for AI to fully replicate or surpass.
Grassini concluded, “It is still to be established whether these capabilities of AI will translate directly on AI systems, replacing human jobs that require creative thinking. I prefer to think that AI will be helping humans to improve their capacity.”
More about the study
The study revealed that AI could take novel and innovative approaches to creative question answering.
While the very best responses were still of human origin, humans were far more prone to fluctuations in concentration and other factors that prevented them from achieving the breadth of AI-generated responses.
Here’s how it worked:
- Methodology: The study utilized the Alternate Uses Task (AUT), a long-established test sure for divergent thinking and creativity. Participants included 256 humans and three advanced AI chatbots. They were asked to think of uncommon and creative uses for everyday objects, and both the human and AI responses were evaluated based on their originality and usefulness.
- Performance comparison: On average, AI chatbots outperformed human participants in both mean and maximum creativity scores. However, the highest-performing humans could still match or exceed the creativity levels of AI chatbots.
- Differences between AI and humans: The study revealed that human participants often underperformed due to loss of attention mid-way through the task. Conversely, AI executed the task without tiring, contributing to overall higher performance.
- Strengths and weaknesses of AI and human creativity: The study posits that AI’s access to greater computational resources explains their high average performance. They can essentially calculate more plausible alternative uses for objects, giving them a greater chance of singling out the most unusual uses.
- Limitations and future directions: The study acknowledges limitations, such as a lack of demographic diversity among human participants. Moreover, it uses just one moderately aged creativity test, which is naturally limited in scope.
Researchers are creating new methods for comparing AI and human approaches to tests, questions, and other tasks.
One recent study found that ChatGPT excelled students in nine out of 32 subjects – pretty solid, but good news for humanity, probably.
We’re still one generation of AI away from the technology comprehensively beating us.