ChatGPT's accounting skills are put to the test

ChatGPT has proven its multidisciplinary performance in recent times, but math skills remain its Achilles heel.

The GPT-4 model has conquered medical school exams, law school’s bar exam, and even tackled an MBA test from the Wharton School at the University of Pennsylvania. Performance in the bar exam (a legal exam in the US) reached the 90th percentile.

One large study found that ChatGPT beat humans in nine out of 32 subjects. Admirable but by no means world-beating.

A recent study led by David Wood, a Professor of Accounting at Brigham Young University, quizzed the model’s skill in the field of accounting and revealed a glaring lack of performance.

For many, this seems counterintuitive. Surely AI – a mathematical system – excels at math problems?

Currently, this isn’t the case. Numerous users have reported that large language models (LLMs) struggle with basic mathematical functions. Logic problems also suffer – brainteasers easily catch ChatGPT out as the model can’t systematically determine the correct answer.

Professor Wood took a unique approach to the study, contacting researchers on social media to crowdsource involvement. The response was overwhelming, with 327 co-authors from 186 educational institutions across 14 countries participating. They’re all listed as study authors.

AI authors — Possibly the most authors ever listed for a peer-reviewed study? Source: American Accounting Association.

This approach produced a staggering 27,000-plus accounting exam questions from various domains and levels of difficulty that were posed to ChatGPT.

Despite the variety in question types, covering topics from financial accounting to auditing and managerial accounting to tax, the results were unequivocal. ChatGPT scored 47.4% – considerably lower than the 76.7% average score achieved by human students.

The AI showcased some competence in auditing but suffered when dealing with tax, financial, and managerial accounting challenges.

To combat their poor math skills, LLMs like Google Bard map out mathematical-style questions to executable code and process it numerically rather than as language, but this isn’t wholly reliable either.

In the words of Professor Wood, “When this technology first came out, everyone was worried that students could now use it to cheat,” he commented.

“But opportunities to cheat have always existed. So for us, we’re trying to focus on what we can do with this technology now that we couldn’t do before to improve the teaching process for faculty and the learning process for students. Testing it out was eye-opening.”

So, maybe stick to a calculator next time you’re totting up your finances or working out what taxes to pay rather than relying on ChatGPT.

ChatGPT’s accounting skills are put to the test

Join The Future

Sam Jeans

RELATED POSTS

AI model simulates 500 million years of evolution to create a novel fluorescent protein

Brain imaging study uses AI to reveal neural patterns for sex and gender in children

Tech company cancels AI workers’ rights after pushback

NATO releases a revised AI strategy to combat threats