There is a growing assumption baked into the AI hype cycle: that models trained on the entirety of human scientific knowledge should, in theory, be able to see where science is headed next. A new paper from researchers at Oxford, Stanford, the Allen Institute for AI, and Sakana AI puts that assumption to a rigorous test , and finds it wanting. The paper introduces CUSP, a benchmark designed to measure whether frontier AI can genuinely forecast scientific progress, not just recognize it in hindsight.

CUSP stands for Cutoff-conditioned Unseen Scientific Progress, a multi-disciplinary, event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, mechanistic reasoning, generative solution design, and temporal prediction. The key word is cutoff-conditioned: the benchmark is explicitly designed to test what a model can predict about discoveries that happened after its training data ends.

Why this question matters now

AI is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. That gap between capability and understanding is becoming consequential. Labs are deploying AI systems to accelerate drug discovery, materials science, and physics research. If those systems are confidently wrong about which research directions will pan out, the cost is not just wasted compute , it is misdirected scientific effort.

AI systems have already surpassed graduate-level performance on some scientific benchmarks, such as GPQA Diamond, where leading models now exceed PhD-level experts. But acing a multiple-choice exam about known science is very different from predicting what science will discover next. CUSP is built to probe exactly that gap.

What the benchmark actually tests

CUSP is a research dataset for evaluating how well AI systems handle temporally grounded questions about scientific papers under a historical cutoff. Given a model's training cutoff date, CUSP asks: can the model anticipate real scientific discoveries that happened after that date?

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves