Z.ai's GLM-5.2 Hits 1M Context at 10x Cheaper Than Claude

Z.ai

Z.ai's GLM-5.2 Hits 1M Context at 10x Cheaper Than Claude

23H AGO

2 min read

23 hrs ago

2 min read

GLM-5.2 is Z.ai's new flagship model, and it landed at a moment that felt almost scripted. GLM-5.2 dropped 48 hours after US export rules forced Anthropic to disable its top Fable 5 and Mythos 5 models for foreign nationals. Z.ai founder Jie Tang commented: "We deeply regret the sudden restrictions on certain frontier models. The path to AGI should not be surrounded by high walls; it requires cooperation from all of humanity." Whether you read that as principled or opportunistic, the timing put a spotlight on a model that deserves attention on its own merits.

What Just Shipped

GLM-5.2 is an open-weight Mixture-of-Experts language model from Zhipu AI, with ~744B total parameters (~40B active), a usable 1M-token context window, two reasoning modes, and an MIT license. It follows GLM-5 (February), GLM-5-Turbo (March), and GLM-5.1 (April) , meaning Z.ai has now shipped four flagship-tier coding releases in roughly four months. The pace is relentless, and each iteration has been meaningfully different from the last.

GLM-5.2 is a step-function jump from GLM-5.1 in context capacity. The usable context window expands from 200,000 tokens to 1,000,000 tokens , roughly five times larger. The output limit also increased to 131,072 tokens per response, which matters for long refactors, multi-file diffs, and migration scripts that need to output complete files.

Bar chart comparing GLM-5.2 against GLM-5.1, Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro across 8 benchmarks including SWE-bench Pro, Terminal-Bench, and NL2Repo

The Architecture Under the Hood

GLM-5.2 inherits its foundation from GLM-5: a 744-billion-parameter Mixture-of-Experts model with 40 billion active parameters per token, trained on 28.5 trillion tokens and built on DeepSeek Sparse Attention to keep long-context inference affordable. MoE (Mixture-of-Experts) means the model has many specialized sub-networks but only activates a small fraction of them per token , so you get the quality of a 744B model at the compute cost of a ~40B one.

GLM-5.1 was described by Z.ai as an incremental post-training upgrade , same architecture, retargeted reinforcement learning aimed specifically at coding task distributions. The result was a model capable of sustaining roughly 1,700 autonomous agent steps in a single session, up from an industry-wide baseline of around 20 steps a year earlier, and able to run "plan, execute, test, fix, optimize" loops for up to eight hours without human intervention. GLM-5.2 builds on that foundation with the 1M context window and the new dual-effort reasoning system.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Z.ai's GLM-5.2 Hits 1M Context at 10x Cheaper Than Claude

Takeaways

What Just Shipped

The Architecture Under the Hood

Don't miss what's next in AI