Anthropic's Claude Opus 4.8 is now available inside Cursor, arriving just 41 days after Opus 4.7. The update keeps the same price tag but ships with measurably better agentic coding performance, a dramatically cheaper fast mode, and a model that is far less likely to silently let bugs slide past you. Anthropic calls it "a modest but tangible improvement" -- but the numbers and early partner reports tell a more interesting story.

The Opus 4.7 problems it quietly fixes

Opus 4.7 had a rough reception. The Devin team specifically called out that Opus 4.7 had comment-verbosity and tool-calling issues -- and these were not minor complaints: the verbosity issue inflated developer bills, and the tool-calling issue caused agentic coding pipelines to fail mid-execution. Opus 4.8 directly addresses both.

  • 4x fewer silent bugs: Anthropic reports the new model is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked."
  • Fewer tool-calling steps: On CursorBench, Opus 4.8 uses fewer steps for the same intelligence, meaning the token-per-task cost drops without sacrificing pass rates.
  • Less verbose output: The model uses 35% fewer output tokens per task, directly fixing the verbosity problem, and completes tasks in 15% fewer turns.

The practical upshot: the pricing stayed flat at $5/$25 for standard mode, which means the effective cost per task actually decreased because the model uses fewer tokens to accomplish the same work.

Benchmark numbers worth knowing

On SWE-bench Pro -- a coding benchmark using real-world repositories with multi-file diffs -- Opus 4.8 lands at 69.2%, almost 5 points clear of Opus 4.7 (64.3%) and over 10 points ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). The harder the benchmark variant, the wider the gap gets.

The math reasoning jump is even more striking. Opus 4.8 scored 96.7% on the USAMO 2026 math benchmark, up from 69.3% on Opus 4.7 -- a 27.4 percentage point gain in a single 41-day release cycle, the biggest single-cycle math improvement in Opus history. USAMO problems require multi-step proof construction, so this signals a structural improvement in how the model reasons through hard, multi-step problems -- the same kind of reasoning that matters in complex agentic workflows.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves