OpenAI's o3 Cracked 18 Unsolvable Rare Disease Cases in Children

OpenAI

1D AGO

2 min read

REASONING

math_reasoning test_time_compute

LLMS

hallucinations

1 day ago

REASONING

math_reasoning test_time_compute

LLMS

hallucinations

2 min read

For families with children suffering from rare genetic diseases, the wait for a diagnosis can stretch across years -- sometimes decades. A new peer-reviewed study published in NEJM AI shows that AI can meaningfully chip away at that backlog. Researchers from Boston Children's Hospital's Manton Center for Orphan Disease Research, Harvard University, and OpenAI used o3 Deep Research to reanalyze 376 previously unsolved pediatric cases -- and surfaced leads that led to 18 confirmed diagnoses.

The Diagnostic Odyssey Problem

Rare disease diagnosis is one of medicine's hardest problems. Roughly half of patients with rare diseases remain undiagnosed after extensive testing and specialist review. The challenge is not just biological -- it is operational. A patient's phenotype descriptions, test results, and family history can be split across databases that use different identifiers, formats, and vocabularies. Experts may also sequence a child's genome before a relevant gene or its variants have been linked to disease.

This creates a compounding maintenance problem. The patient's genome may stay the same, but the evidence around it keeps changing: researchers link new genes and variants to disease, labs reclassify old variants, and case databases and papers accumulate new observations. Each update means old inconclusive cases are potentially worth revisiting -- but doing so at scale is nearly impossible with human effort alone.

How the Workflow Actually Ran

The team did not simply feed raw genomic data into a chatbot. For each case, they assembled a de-identified packet containing standardized Human Phenotype Ontology (HPO) terms -- a controlled vocabulary for describing symptoms -- along with clinician notes, patient metadata, and a filtered variant table capturing each variant's rarity, predicted protein effect, ClinVar classification, and signal quality across available family members. HPO terms are essentially a shared language that lets computers and clinicians describe symptoms consistently across databases.

The model was asked to propose the most plausible molecular explanation and to show its work. Researchers then reviewed outputs using the ACMG/AMP framework -- the standard clinical classification system for genetic variants -- with at least two team members reviewing each candidate. A finding only counted as a diagnosis after a CLIA-certified lab confirmed it and the clinical team returned the result to the family. The AI's job was hypothesis generation, not diagnosis.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

OpenAI's o3 Cracked 18 Unsolvable Rare Disease Cases in Children

Takeaways

The Diagnostic Odyssey Problem

How the Workflow Actually Ran

Don't miss what's next in AI