Epoch Caught 42% of FrontierMath Problems Broken, Sending GPT-5.5 Scores Soaring

EDITORIAL LEADERBOARD

Epoch AI

Epoch Caught 42% of FrontierMath Problems Broken, Sending GPT-5.5 Scores Soaring

4H AGO

2 min read

4 hrs ago

2 min read

FrontierMath: Tiers 1–4 (v2) is live, and the story behind it is as revealing as the scores themselves. Epoch AI has just published a sweeping correction to one of the field's most respected math benchmarks, after an AI-assisted audit uncovered errors in 42% of its problems. The rankings held up, but the scores jumped significantly across the board , and the episode raises hard questions about how we measure AI progress at the frontier.

A benchmark built for a different era

When FrontierMath launched in late 2024, current state-of-the-art AI models solved under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community. The benchmark was designed to stay ahead of rapidly improving models: it is a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians, covering most major branches of modern mathematics , from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory. Solving a typical problem requires multiple hours of effort from a researcher in the relevant branch of mathematics, and for the upper end questions, multiple days.

Following the v2 update, the full FrontierMath dataset consists of 338 problems: a base set of 295 problems called Tiers 1–3, and an expansion set of 43 exceptionally difficult problems called Tier 4. Tier 4 is the hardest tier , Difficulty Tiers 1–3 cover undergraduate problems through exploratory problems suitable for an advanced graduate student, while Tier 4 is research-level mathematics.

The benchmark was developed through collaboration with over 60 mathematicians from universities across more than a dozen countries, spanning academic ranks from graduate students to faculty members, collectively holding 14 IMO gold medals, with one contributing mathematician also holding a Fields Medal.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Takeaways

A benchmark built for a different era

Don't miss what's next in AI