LLMs are really bad at solving simple river crossing puzzles

June 25, 2024

  • Users on X tested leading LLMs to see if they could solve simple logic problems
  • Claude and GPT-4o struggle with simple river crossing puzzles and offer ridiculous solutions
  • The results may highlight LLMs’ lack of common sense reasoning, or susceptibility to training data

Large language models like GPT-4o can perform incredibly complex tasks, but even the top models struggle with some basic reasoning challenges that children can solve.

In an interview with CBS, the ‘godfather of AI’, Geoffrey Hinton, said that ​​AI systems might be more intelligent than we know and there’s a chance the machines could take over.

When asked about the level of current AI technology Hinton said, “I think we’re moving into a period when for the first time ever we may have things more intelligent than us.”

Meta’s chief AI scientist, Yann LeCun, will have us believe that we’re a long way off from seeing AI achieve even “dog-level” intelligence.

So which is it?

This week, users on X posted examples of the incredible coding ability Anthropic’s new Claude model exhibits. Others ran experiments to highlight how AI models still struggle with very basic reasoning.

River crossing puzzle

The classic river crossing puzzle has multiple variations but Wikipedia’s version sums it up like this:

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

Finding the solution requires some basic planning and reasoning on different scenarios but it’s not a particularly difficult problem to solve. If you’re human.

Can GPT-4o solve it? If you copy and paste the puzzle into ChatGPT it gives you the right answer, but that Wikipedia page was almost certainly in its training data.

What if we made the puzzle a lot simpler and changed it slightly so the LLM couldn’t rely on its training data?

British Mathematics Professor Sir William Timothy Gowers showed how the inability of LLMs to apply logic is easily exposed.

ChatGPT’s failed attempt at solving a simplified river crossing puzzle. Source: X @wtgowers

The correct answer to the puzzle is that only one trip is required. But it seems like ChatGPT is trying to remember an answer rather than simply reasoning through the puzzle.

Is Claude Sonnet 3.5 any better?

Meta Data Scientist Colin Fraser’s experiment confirms that even the leading AI model currently available can’t solve this simple puzzle.

It may have been a little disingenuous for a data scientist from Meta not to show his results using Llama 3.

I asked Meta AI the same question and it also gets it completely wrong.

Meta AI powered by Llama 3 also gets the river puzzle answer wrong. Source: Meta

Yann LeCun explained the reason behind these results saying, “The issue is that LLMs have no common sense, no understanding of the world, and no ability to plan (and reason).”

Is that true, or is something else at play?

What these interactions might reveal is not a lack of reasoning ability, but rather how much the output of an LLM is influenced by its training data. Meta AI’s response calling this a “classic puzzle” hints that this might be what’s happening.

The river crossing puzzle variations often reference the amount of “trips” required. When you pose the puzzle without using that word, the LLM solves it.

These experiments were interesting, but they don’t definitively answer the argument over whether AI models are truly intelligent or simply next-token predictive machines.

However, the results do highlight how susceptible LLMs are to training data. When GPT-4o aces the LSAT exams, is it “thinking” to find the answers to the problems or remembering them?

Until the engineers understand what goes on inside the AI black boxes they created, the arguments on X will continue unresolved.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions