ChatGPT’s pediatric exam skills examined by medical experts

September 11, 2023

AI Harvard

ChatGPT has demonstrated its examination skills, scoring similarly to students on several degree courses and other tests, such as the Bar Exam for lawyers. But can it deliver satisfactory results on medical exams?

A group of pediatric doctors put ChatGPT, specifically the GPT-3.5 model, to the test.

They tested ChatGPT on the neonatal-perinatal board exam, which is critical for pediatric students. The study, published in JAMA, revealed that ChatGPT version 3.5 scored only 46% correct answers.

ChatGPT’s performed best on basic recall and clinical reasoning-themed questions, but its limitations were exposed by questions requiring multi-logic reasoning.

Specifically, the model scored its lowest, 37.5 percent, in the gastroenterology section and its highest, 78.5 percent, in ethics – perhaps ironically. 

The study’s senior author, Andrew Beam, is an assistant professor of biomedical informatics at Harvard Medical School. 

He pointed out that the rapid advancements in AI have been nothing short of remarkable. “There was this moment last year when, all of a sudden, five or six different models were all getting scores of 80 percent or higher,” he said, emphasizing the quick pace at which the field is evolving.

Beam’s wife, Kristyn, an instructor in pediatrics at Harvard Medical School, also participated in the study. “I wanted it not to do well, so from that perspective I was happy,” she confessed. 

However, she acknowledges the inevitability of AI embedding itself into healthcare, as we’ve already seen with AI-powered MRI scanning, eye disease diagnostics, and drug development, to name but a few of its burgeoning repertoire of applications. 

“It is really important to figure out how to bring that into the clinical world and to bring it in safely,”

The team plans to conduct tests with the superior GPT-4 and apply them to the same neonatal-perinatal and anesthesiology board exams.

Andrew Beam also pointed out the importance of knowing which version of a large language model you’re using, noting that the newer GPT-4 is available on a subscription basis, while the older ChatGPT 3.5 is still freely available.

“Most users will likely be attracted to the free tool and should keep in mind its limitations,” he said. Globally, $20/mo is far from negligible. 

ChatGPT has been tested on various exams, including a recent study that pitched it against 32 degree-level topics, finding that it beat or exceeded students on only 9/32 exams. 

The AI has also been tested on the bar exam for law, Graduate Record Examinations (GRE), SAT Reading and Writing, Advanced Placement exams, and many others, often scoring very highly. 

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions