Anthropic's Claude Opus 4.8 Cuts Coding Bugs 4x in 41 Days

Cursor

Anthropic's Claude Opus 4.8 Cuts Coding Bugs 4x in 41 Days

May 28, 2026

2 min read

May 28, 2026

2 min read

Anthropic's Claude Opus 4.8 is now available inside Cursor, arriving just 41 days after Opus 4.7. The update keeps the same price tag but ships with measurably better agentic coding performance, a dramatically cheaper fast mode, and a model that is far less likely to silently let bugs slide past you. Anthropic calls it "a modest but tangible improvement" -- but the numbers and early partner reports tell a more interesting story.

The Opus 4.7 problems it quietly fixes

Opus 4.7 had a rough reception. The Devin team specifically called out that Opus 4.7 had comment-verbosity and tool-calling issues -- and these were not minor complaints: the verbosity issue inflated developer bills, and the tool-calling issue caused agentic coding pipelines to fail mid-execution. Opus 4.8 directly addresses both.

4x fewer silent bugs: Anthropic reports the new model is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked."
Fewer tool-calling steps: On CursorBench, Opus 4.8 uses fewer steps for the same intelligence, meaning the token-per-task cost drops without sacrificing pass rates.
Less verbose output: The model uses 35% fewer output tokens per task, directly fixing the verbosity problem, and completes tasks in 15% fewer turns.

The practical upshot: the pricing stayed flat at $5/$25 for standard mode, which means the effective cost per task actually decreased because the model uses fewer tokens to accomplish the same work.

Benchmark numbers worth knowing

On SWE-bench Pro -- a coding benchmark using real-world repositories with multi-file diffs -- Opus 4.8 lands at 69.2%, almost 5 points clear of Opus 4.7 (64.3%) and over 10 points ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). The harder the benchmark variant, the wider the gap gets.

The math reasoning jump is even more striking. Opus 4.8 scored 96.7% on the USAMO 2026 math benchmark, up from 69.3% on Opus 4.7 -- a 27.4 percentage point gain in a single 41-day release cycle, the biggest single-cycle math improvement in Opus history. USAMO problems require multi-step proof construction, so this signals a structural improvement in how the model reasons through hard, multi-step problems -- the same kind of reasoning that matters in complex agentic workflows.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Anthropic's Claude Opus 4.8 Cuts Coding Bugs 4x in 41 Days

Takeaways

The Opus 4.7 problems it quietly fixes

Benchmark numbers worth knowing

Don't miss what's next in AI