OpenAI Folds omni-moderation-latest Into Generation Calls, Cutting App Complexity

EDITORIAL LEADERBOARD

OpenAI Developers

OpenAI Folds omni-moderation-latest Into Generation Calls, Cutting App Complexity

22H AGO

2 min read

22 hrs ago

2 min read

OpenAI is collapsing two API calls into one. Moderation scores can now ride along inside the same request you use for generation through the Responses API and Completions API, so applications can decide whether to log, route, review, or block content without making a second round trip to the /v1/moderations endpoint.

If you have ever built a production chatbot, you know the dance: send the user message to moderation, wait, then send it to the model, then maybe send the model output back to moderation again. That extra hop adds latency and code complexity. Folding the signal into the generation response removes a step from a flow that nearly every customer-facing LLM app already had to build.

What actually changed

The moderation classifier itself is not new. What is new is where the output shows up. The same category flags and confidence scores produced by omni-moderation-latest are now returned alongside generated content, instead of requiring a dedicated call.

For reference, the moderation response shape includes three useful fields you can branch on in code:

flagged: a boolean telling you whether the model classified the content as potentially harmful
categories: per-category booleans across harassment, hate, illicit, self-harm, sexual, and violence (plus subcategories like

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Takeaways

What actually changed

Don't miss what's next in AI