LiquidAI's IFStruct Trains a 350M Model to Beat 4B Rivals at Structured Output

Liquid AI

3H AGO

2 min read

BENCHMARKS

LLMS

small_models structured_output

3 hrs ago

BENCHMARKS

LLMS

small_models structured_output

2 min read

Getting a model to reliably output valid JSON or YAML sounds like a solved problem. It is not. Structured output remains a common failure mode for language models, especially when schemas become complex and strings require careful escaping. Liquid AI just released IFStruct, an open benchmark designed to measure exactly this , and the results expose a gap that most existing evals quietly paper over.

The problem with how we test structured output today

IFStruct targets something existing benchmarks miss: organic user requests present schema requirements in a variety of ways, often with additional constraints about formatting requirements. Most benchmarks hand the model a clean, finalized JSON Schema and ask it to comply. Real users don't do that. They paste an annotated example, describe the schema in plain English, switch formats halfway through, and add constraints like "no code fence" or "no commentary."

Constrained generation can enforce syntactic validity, but it cannot by itself make the model choose the right fields, values, or escaped content. Even under a schema constraint, the model's logits still need to meaningfully reflect the user's requested structure. IFStruct is scored without constrained decoding, so it measures whether the model actually understood the request , not whether the decoding machinery forced it into shape.

What the benchmark actually tests

Each IFStruct prompt asks the model to generate a handful of instances of a randomized item (e.g. "Generate two recipes for blueberry pancakes"), with the schema requirements presented in a variety of different ways. The content itself is not assessed, only the structure. This isolation is intentional: the signal stays clean, without conflating schema-following with reasoning or data extraction ability.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

LiquidAI's IFStruct Trains a 350M Model to Beat 4B Rivals at Structured Output

Takeaways

The problem with how we test structured output today

What the benchmark actually tests

Don't miss what's next in AI