Getting a model to reliably output valid JSON or YAML sounds like a solved problem. It is not. Structured output remains a common failure mode for language models, especially when schemas become complex and strings require careful escaping. Liquid AI just released IFStruct, an open benchmark designed to measure exactly this , and the results expose a gap that most existing evals quietly paper over.

The problem with how we test structured output today

IFStruct targets something existing benchmarks miss: organic user requests present schema requirements in a variety of ways, often with additional constraints about formatting requirements. Most benchmarks hand the model a clean, finalized JSON Schema and ask it to comply. Real users don't do that. They paste an annotated example, describe the schema in plain English, switch formats halfway through, and add constraints like "no code fence" or "no commentary."

Constrained generation can enforce syntactic validity, but it cannot by itself make the model choose the right fields, values, or escaped content. Even under a schema constraint, the model's logits still need to meaningfully reflect the user's requested structure. IFStruct is scored without constrained decoding, so it measures whether the model actually understood the request , not whether the decoding machinery forced it into shape.

What the benchmark actually tests

Each IFStruct prompt asks the model to generate a handful of instances of a randomized item (e.g. "Generate two recipes for blueberry pancakes"), with the schema requirements presented in a variety of different ways. The content itself is not assessed, only the structure. This isolation is intentional: the signal stays clean, without conflating schema-following with reasoning or data extraction ability.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves