OpenAI's GPT-5.5 Hits 82.7% on Terminal-Bench and Rewires Its Agent Platform

OpenAI Developers

23H AGO

3 min read

23 hrs ago

3 min read

OpenAI just dropped a recap of everything it shipped to the API over the past six months, and the list is longer than most companies' annual roadmaps. Thirty-plus models, features, and tools landed quietly in the changelog while the headlines were elsewhere. Here's what actually matters for anyone building on the platform.

A model family for every budget

The flagship addition is GPT-5.5, OpenAI's most capable model to date. It excels at writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished. Crucially, GPT-5.5 delivers this step up in intelligence without compromising on speed: larger, more capable models are often slower to serve, but GPT-5.5 matches GPT-5.4 per-token latency in real-world serving.

On benchmarks, the numbers are strong. On Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination, it achieves a state-of-the-art accuracy of 82.7%. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, it reaches 58.6%, solving more tasks end-to-end in a single pass than previous models.

Below GPT-5.5, two new smaller models round out the lineup. GPT-5.4 mini and GPT-5.4 nano are now available in the Chat Completions and Responses API. GPT-5.4 mini brings GPT-5.4-class capabilities to a faster, more efficient model for high-volume workloads, while GPT-5.4 nano is optimized for simple high-volume tasks where speed and cost matter most. Feature support differs: GPT-5.4 mini supports tool search, built-in computer use, and compaction, while GPT-5.4 nano supports compaction but does not support tool search or computer use.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best for
GPT-5.5	$5.00	$30.00	Complex reasoning, coding, agentic work
GPT-5.4 mini	$0.75	$4.50	High-volume production apps
GPT-5.4 nano	$0.20	$1.25	Classification, extraction, routing

Cached inputs drop 90% on GPT-5.5 and GPT-5.4, which matters a lot for repeated system prompts and long-context agents. GPT-5.5 supports a 1,050,000-token context window with 128K max output, though prompts with more than 272K input tokens are priced at 2x input and 1.5x output for the full session.

Voice gets a serious upgrade

The realtime audio stack was completely rebuilt. GPT-Realtime-2 is a new realtime voice model with configurable reasoning for speech-to-speech agents, along with GPT-Realtime-Translate for streaming speech translation and GPT-Realtime-Whisper for streaming speech-to-text. GPT-Realtime-2 is the most capable realtime voice model, supporting speech-to-speech interactions with configurable reasoning effort, stronger instruction following, and more reliable tool use for complex voice-agent workflows. The old Realtime API Beta was removed, so if you're still on it, migration is not optional.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

OpenAI's GPT-5.5 Hits 82.7% on Terminal-Bench and Rewires Its Agent Platform

Takeaways

A model family for every budget

Voice gets a serious upgrade

Don't miss what's next in AI