Humanity’s Last Exam wants your tough questions to stump AI

September 17, 2024

  • The Humanity’s Last Exam project is calling for submissions of tough questions to challenge AI models
  • The capabilities of advanced AI models are close to exceeding standard benchmarks used to test them
  • A pool of a total of $500,000 will be awarded to top selected questions

Benchmarks are struggling to keep up with advancing AI model capabilities and the Humanity’s Last Exam project wants your help to fix this.

The project is a collaboration between the Center for AI Safety (CAIS) and AI data company Scale AI. The project aims to measure how close we are to achieving expert-level AI systems, something existing benchmarks aren’t capable of.

OpenAI and CAIS developed the popular MMLU (Massive Multitask Language Understanding) benchmark in 2021. Back then, CAIS says, “AI systems performed no better than random.”

The impressive performance of OpenAI’s o1 model has “destroyed the most popular reasoning benchmarks,” according to Dan Hendrycks, executive director of CAIS.

OpenAI’s o1 MMLU performance compared with earlier models. Source: OpenAI

Once AI models hit 100% on the MMLU, how will we measure them? CAIS says “Existing tests now have become too easy and we can no longer track AI developments well, or how far they are from becoming expert-level.”

When you see the jump in benchmark scores that o1 added to the already impressive GPT-4o figures, it won’t be long before an AI model aces the MMLU.

Humanity’s Last Exam is asking people to submit questions that would genuinely surprise you if an AI model delivered the correct answer. They want PhD level exam questions, not the ‘how many Rs in Strawberry’ type that trip up some models.

Scale explained that “As existing tests become too easy, we lose the ability to distinguish between AI systems which can ace undergrad exams, and those which can genuinely contribute to frontier research and problem solving.”

If you have an original question that could stump an advanced AI model then you could have your name added as a co-author of the project’s paper and share in a pool of $500,000 that will be awarded to the best questions.

To give you an idea of the level the project is aiming at Scale explained that “if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow.”

There are a few interesting restrictions on the kinds of questions that can be submitted. They don’t want anything related to chemical, biological, radiological, nuclear weapons, or cyberweapons used for attacking critical infrastructure.

If you think you’ve got a question that meets the requirements then you can submit it here.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions