Alibaba's Qwen3.7-Max Runs Autonomously for 35 Hours, Beating Every Rival Agent

Qwen

May 21, 2026

2 min read

AGENTS

agent_frameworks code_agents tool_use

LLMS

May 21, 2026

AGENTS

agent_frameworks code_agents tool_use

LLMS

2 min read

Alibaba's Qwen team has released Qwen3.7-Max, a proprietary flagship model positioned squarely at the top of the agentic AI market. The headline number is hard to ignore: in a live demonstration, the model ran autonomously for 35 hours straight, made 1,158 tool calls, and achieved a 10x geometric mean speedup on a GPU kernel optimization task , on hardware it had never seen during training. That's not a benchmark. That's a real engineering task, completed without human intervention.

The release frames Qwen3.7-Max as a "versatile agent foundation" , a deliberate pivot away from chat-first models toward systems designed to sustain coherent reasoning across hours-long workflows. It's available now via Alibaba Cloud Model Studio and Qwen Studio.

The 35-Hour Kernel Run: What Actually Happened

The most striking demonstration in the release is the autonomous kernel optimization task. The Qwen team handed the model a single objective , optimize an attention kernel on a T-Head ZW-M890 PPU , gave it access to a terminal and a test harness, and stepped back.

Over 35 hours, Qwen3.7-Max:

Made 1,158 tool calls
Ran 432 kernel evaluations
Diagnosed compilation failures autonomously
Iteratively rewrote and profiled the code
Achieved a 10.0x geometric mean speedup over the Triton reference implementation

For comparison, Chinese competitor models capped out significantly lower: GLM-5.1 Thinking at 7.3x, Kimi K2.6 at 5.0x, and DeepSeek V4 Pro at 3.3x. Most agent models stop making meaningful progress after a few hours. Qwen3.7-Max was still finding improvements past the 30-hour mark.

This isn't a cherry-picked demo. The task was run on hardware the model had never encountered, and the result is a measurable engineering outcome , not a leaderboard position.

How It Was Built: Environment Scaling and Cross-Harness RL

The architecture story behind Qwen3.7-Max is as interesting as the benchmark numbers. The core training innovation is what the team calls environment scaling

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Alibaba's Qwen3.7-Max Runs Autonomously for 35 Hours, Beating Every Rival Agent

Takeaways

The 35-Hour Kernel Run: What Actually Happened

How It Was Built: Environment Scaling and Cross-Harness RL

Don't miss what's next in AI