Alibaba's Qwen3.7-Max Ran Autonomously for 35 Hours Beating Claude at One-Sixth the Price

Tongyi Lab

May 21, 2026

2 min read

AGENTS

agent_frameworks code_agents tool_use

LLMS

May 21, 2026

AGENTS

agent_frameworks code_agents tool_use

LLMS

2 min read

Alibaba's Qwen team quietly did something remarkable: they shipped a proprietary model that ran autonomously for 35 hours, fired 1,158 tool calls without a human in the loop, and delivered a 10× speedup on a GPU kernel the model had never encountered during training. Then they priced it at $2.50 per million input tokens , roughly one-sixth the cost of Claude Opus 4.7. That model is Qwen3.7-Max, and it is the most credible challenge yet to the idea that frontier agentic AI belongs exclusively to OpenAI and Anthropic.

Built for the Agent Era, Not the Chat Era

Announced on May 20, 2026, at the Alibaba Cloud Summit, Qwen3.7-Max is a proprietary, text-focused reasoning model engineered specifically for the agentic era. The Qwen team describes it as a "versatile agent foundation" rather than a general-purpose chat model , a distinction that shows up clearly in the benchmark profile. It wins where it matters for production agents: long-horizon execution, MCP tool orchestration, and real-world software engineering. It does not lead on every raw intelligence metric, and Alibaba is not pretending otherwise.

Three things define the release:

1-million-token context window , roughly 2,000 pages of text or a full mid-sized codebase in a single request, up from 262K on the previous generation
Cross-harness generalization , the model was trained to perform consistently across different agent scaffolds, not just the one it was evaluated on
Long-horizon autonomy , the 35-hour autonomous session is the longest publicly documented agent run from any major lab

The 35-Hour Kernel Optimization Run

The flagship demonstration is worth understanding in detail, because the details are what make it significant. The Qwen team handed the AI a tough coding problem , optimizing GPU code , giving it just the instructions and a way to test its work, and then stepped back. From there, Qwen3.7-Max worked autonomously, writing code, running tests, finding bottlenecks, and redesigning the code. It looped through this process over a thousand times.

It executed 1,158 tool calls, ran 432 kernel evaluations, diagnosed failures, and achieved a 10.0x geometric mean speedup , all without human intervention. The hardware was Alibaba's own ZW-M890 PPU, a chip the model had never seen during training. No profiling data, no hardware documentation, no example kernels. The model had to navigate an unfamiliar architecture and maintain a coherent optimization strategy across the entire context window for over a day.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Alibaba's Qwen3.7-Max Ran Autonomously for 35 Hours Beating Claude at One-Sixth the Price

Takeaways

Built for the Agent Era, Not the Chat Era

The 35-Hour Kernel Optimization Run

Don't miss what's next in AI