Alibaba's Qwen3.7-Max Slashes Input Costs 90% With Automatic Prompt Caching

Qwen

May 25, 2026

2 min read

May 25, 2026

2 min read

Every time you call a large language model with a long system prompt or a big reference document, the model re-reads every single token from scratch. That's wasteful and expensive, especially in agentic workflows where the same codebase context or product manual gets sent on every turn. Alibaba's Qwen team just fixed that for Qwen3.7-Max: implicit prompt caching is now live, and it requires zero setup on your end.

What just landed

Alibaba's Qwen team announced that Qwen3.7-Max now supports implicit prompt caching, automatically enabled with no configuration required. For developers already using Qwen3.7-Max for coding tasks, the cost savings activate immediately without a single code change.

Two caching modes are now available on the model:

Implicit caching -- fully automatic. The system detects repeated content prefixes across requests and reuses the cached computation. No API changes needed.
Explicit caching -- you manually mark what to cache by adding "cache_control": {"type": "ephemeral"} to your messages array. You get deterministic, guaranteed cache hits in exchange for a small amount of setup work.

The cost math

Qwen3.7-Max supports prompt caching at $0.25 per million tokens -- a 90% discount on the standard $2.50 input rate. For workloads that reuse long system prompts or reference documents across turns, that drops the effective input cost dramatically. On the explicit side, creating the cache incurs only a 25% surcharge over the standard input price, while each subsequent hit saves 90% -- meaning a single hit is enough to break even.

For repeated context, effective price after prompt caching is applied can be 60-80% cheaper than the list rate. The standard list price for Qwen3.7-Max is $2.50 per 1M input tokens and $7.50 per 1M output tokens. There is currently a promotional rate cutting that in half.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Alibaba's Qwen3.7-Max Slashes Input Costs 90% With Automatic Prompt Caching

Takeaways

What just landed

The cost math

Don't miss what's next in AI