OpenAI's latest health update is less about a new model and more about what happens when you systematically wire physician judgment directly into the training loop. GPT-5.5 Instant, the default model for all free ChatGPT users, has just hit a milestone: on OpenAI's most challenging health evaluations, it now performs on par with the company's frontier Thinking models. That's a meaningful gap closed, and it lands for everyone, not just paid subscribers.

230 million health questions a week

Every week, more than 230 million people turn to ChatGPT for help with health and wellness questions: making sense of health information, understanding lab results, preparing for appointments, navigating insurance, building healthier habits, and figuring out what to ask next. That scale makes health one of the highest-stakes domains for any general-purpose model. A marginal improvement in how a model handles uncertainty or escalates urgency isn't an academic win -- it has real consequences.

What actually got better

The improvements are specific and measurable. According to OpenAI, the gains center on four behaviors that matter most in health conversations:

  • Recognizing urgency: Better at flagging when a situation may require immediate medical attention
  • Asking for context: More likely to request relevant information before giving advice
  • Calibrated uncertainty: Explains what it doesn't know without being either dismissive or overconfident
  • Clearer communication: Makes complex medical information easier to understand for non-experts

The production numbers back this up. Based on a comparison of recent production traffic in health -- billions of messages a week -- the rate of responses with at least one flagged factuality issue has fallen by 71% in the last two months. That's not a benchmark number; that's live traffic.

On the benchmark side, GPT-5.5 Instant improves over GPT-5.3 Instant on HealthBench (+1.8), HealthBench Hard (+2.7), and HealthBench Professional (+5.5). The HealthBench Professional jump is the most significant -- that's the clinician-facing evaluation covering care consults, documentation, and medical research, where the bar is highest.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves