Humanity's Last Exam: Global Effort to Challenge AI Beyond Traditional Benchmarks

March 13, 2026

Science

The exam is designed to identify AI system weaknesses in depth, context, and specialized human expertise, not merely reward high scores on traditional benchmarks.
Notable contributors include Dr. Tung Nguyen from Texas A&M University, who helped author many questions, especially in mathematics and computer science.
The project represents a global, interdisciplinary effort, with historians, linguists, physicists, medical researchers, and others contributing to reveal AI gaps that narrower domains miss.
Nguyen emphasizes that the benchmark helps policymakers, developers, and users understand AI capabilities and risks, guiding safer and more reliable technology development.
The Humanity's Last Exam (HLE) comprises 2,500 questions across mathematics, humanities, natural sciences, ancient languages, and other specialized fields, designed to require deep knowledge and expert context beyond pattern recognition.
Early tests show varying performance across models, with some top systems reaching roughly 40–50% accuracy, underscoring the test's difficulty.
While some questions are public to promote transparency, most remain hidden to prevent memorization and ensure the test measures genuine understanding.
A global consortium of nearly 1,000 researchers created HLE to address the ease of older benchmarks and to provide a tougher, more durable standard.
Questions were vetted by experts worldwide, removing any item that a leading AI could already answer to keep the test just beyond current capabilities.
The aim is to provide a durable, transparent benchmark that highlights gaps between AI and human intelligence, not to render humans obsolete.
The exam was developed by nearly 1,000 researchers worldwide, including Dr. Nguyen, with questions validated for a single verifiable answer and crafted to resist simple internet searches.
The overarching message is that high scores on human-origin benchmarks do not guarantee true general intelligence, and the effort maps where AI strengths align with or diverge from deep human expertise.

Summary based on 2 sources

Get a daily email with more AI stories

Sources

ScienceDaily • Mar 13, 2026

Scientists built the hardest AI test ever and the results are surprising

Mirage News • Mar 13, 2026

Scientists Craft Toughest AI Test, Results Surprise

Humanity's Last Exam: Global Effort to Challenge AI Beyond Traditional Benchmarks

Get a daily email with more AI stories

Sources

More Stories