Most AI benchmarks in biology test whether a model knows facts or can run a standard pipeline. GeneBench-Pro tests something harder: can an agent look at a messy, real-world genomics dataset, figure out what is wrong with it, choose the right analysis strategy, and arrive at a conclusion that a researcher could actually act on? That is a very different bar, and today's best models clear it less than a third of the time.

The benchmark gap nobody was measuring

Existing biology benchmarks mostly measure knowledge retrieval, execution of routine pipelines, or a single analysis step. They do not capture what actually occupies most of a computational scientist's time: cleaning and normalizing data, exploratory analysis, statistical model selection, diagnostic iteration, and producing a conclusion that informs a downstream scientific or translational decision.

GeneBench-Pro is a challenging, research-level benchmark for testing whether models can handle the kind of judgment-heavy analysis that real-world computational biology requires. It is the successor to the original GeneBench, and it deliberately raises the difficulty floor. OpenAI calls the target skill "research taste" , the chains of judgment calls that shape an analysis: which questions the data can actually support, how early warning signs should change your model, and when your initial plan needs to be thrown out.

129 problems, no shortcuts allowed

GeneBench-Pro covers 129 questions across genomics, quantitative biology, and translational medicine, capturing the complexity, iterative nature, and ambiguity of scientific research in computational biology. The 10 domains include:

  • Statistical and population genetics
  • Regulatory omics and functional genomics
  • Proteomics and biomarkers
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves