Track
June has certainly been an interesting month for AI model releases. Anthropic launched Claude Fable 5 and then withdrew it from public access. Moonshot AI shipped Kimi K2.7-Code, reporting a +21.8% gain on Kimi Code Bench v2 over its predecessor. Most recently, Z.ai announced GLM-5.2, its new flagship coding and agentic AI model, available immediately to all GLM Coding Plan users, including Lite, Pro, Max, and Team tiers.
GLM-5.2 ships with a 1 million token context window, up to 131,072 output tokens, and two reasoning effort levels: high and max. While Z.ai initially published no official benchmark scores, the newly released developer documentation confirms GLM-5.2 as the leading open-source model across major coding metrics, placing it within striking distance of closed-source frontier models. The announcement otherwise focused on availability, context, and the open-source roadmap, with MIT-licensed weights described as pending.
In this article, I'll cover what GLM-5.2 is, what's new compared to GLM-5.1, how to switch to it in Claude Code, OpenClaw, and Cline, and what the benchmarks mean for practitioners using it. You can also check out our comparison of GPT-5.5 vs Gemini 3.1 Pro for context on where the frontier currently sits.
What is GLM-5.2?
GLM-5.2 is Z.ai's new flagship model in the GLM-5 lineage, released on June 16, 2026. It sits at the top of the GLM Coding Plan and replaces GLM-5.1 as the primary model for coding and agentic tasks. Because it uses an Anthropic-compatible endpoint, it drops seamlessly into tools like Claude Code and Cline with a quick base URL swap and a model name change.
Compared to GLM-5.1, the core practical upgrades include:
- Massive Context: Jump from ~200,000 tokens to 1,000,000 tokens when using the
glm-5.2[1m]identifier. - Expanded Output: Maximum output tokens are documented at 131,072 (up from 120,000).
- Dual Reasoning Modes: Introduces high and max effort levels, with Z.ai recommending max for complex task stability.
The Architecture Under the Hood
While the previous generation was a total black box, Z.ai’s technical blog revealed several custom engineering mechanisms built to handle long-context, agentic workloads without skyrocketing latency:
| Optimization | System | How it Works |
|---|---|---|
| Attention Mechanism | IndexShare | Reuses a single lightweight indexer across every four transformer layers, cutting per-token FLOPs by 2.9x at a 1M context. |
| Memory Management | LayerSplit | Implements fine-grained memory management to prevent the system from buckling under KV-cache memory limits. |
| Inference Speed | MTP + KVShare | Overhauls the Multi-Token Prediction layer with speculative decoding, boosting token acceptance length by up to 20%. |
| Post-Training | "slime" Infrastructure | A specialized training framework that allowed Z.ai to merge over ten expert models in just two days. |
| Agent Stability | Critic-based PPO | Shifts to a direct actor-critic Reinforcement Learning formulation featuring an active "anti-hack" module to anchor long-horizon trajectories. |
What's New With GLM-5.2?
Three changes stand out in this release: the expanded context window, the dual reasoning effort system, and the integration path into third-party coding agents. Each has practical implications for how you'd actually use the model.
1 million token context window
GLM-5.2 supports a 1 million token context window, but it's opt-in rather than default. To activate it, you append [1m] to the model name in your configuration: glm-5.2[1m]. You also need to set the compression window parameter CLAUDE_CODE_AUTO_COMPACT_WINDOW to 1000000 in your settings.json.
This matters for coding workflows where you're working across large codebases. A 1M token window can hold roughly 750,000 words of code and context simultaneously, which is enough to load an entire mid-sized repository without chunking. The caveat is that long-context quality often degrades at the extremes, and Z.ai has not published retrieval accuracy numbers at 1M tokens for this model.
One practical note: if Claude Code reports that the model with the [1m] suffix does not exist, the fix is to upgrade Claude Code to the latest version. This is a version compatibility issue, not a model availability issue.
Two reasoning effort levels
GLM-5.2 introduces a two-tier effort system: high and max. In Claude Code, you switch between them using the /effort command during a session. The mapping from Claude Code's effort labels to GLM-5.2's actual effort levels is as follows:
- low, medium, high (default): maps to GLM-5.2 high effort
- xhigh, max, ultracode: maps to GLM-5.2 max effort
Z.ai explicitly recommends max effort for coding tasks. The default in a new session maps to high, so if you're running complex multi-step tasks, you'll want to switch manually. This is the same tradeoff you see in other reasoning models: higher effort means more deliberate output but also higher latency and token usage.
Anthropic-compatible endpoint integration
GLM-5.2 is accessible through Z.ai's Anthropic-compatible API endpoint at https://api.z.ai/api/coding/paas/v4. This means any tool that supports a custom Anthropic base URL can use GLM-5.2 without waiting for native support. Claude Code, OpenClaw, and Cline all work today, as per the documentation.
The integration approach is a deliberate positioning choice. Rather than building a standalone interface, Z.ai is betting that developers already have a preferred coding agent and just want to swap the underlying model. The tradeoff is that tools without custom model configuration support won't work until Z.ai ships official integrations.
GLM-5.2 is available at no additional cost to all GLM Coding Plan users: Lite, Pro, Max, and Team.
GLM-5.2 vs GLM-5.1: Specification Comparison
| Attribute | GLM-5.2 | GLM-5.1 |
|---|---|---|
| Released | June 13, 2026 | April 7, 2026 |
| Context window | 1,000,000 tokens (glm-5.2[1m]) | ~200,000 tokens |
| Max output tokens | 131,072 | 120,000 |
| Reasoning modes | High, Max | Single mode |
| Architecture | GLM-5 lineage with IndexShare & MTP optimizations | 744B MoE, 40B active |
| License | MIT (weights pending) | MIT (open weights released) |
| Launch benchmarks | 62.1% SWE-bench Pro, 81.0 Terminal-Bench 2.1 | 58.4% SWE-bench Pro, 63.5 Terminal-Bench 2.1 |
| Access at launch | GLM Coding Plan, API, weights pending | Coding Plan, API, and weights |
How to Switch to GLM-5.2
The setup process differs slightly depending on which coding agent you use. Here's how to configure each one.
Switching models in Claude Code
Claude Code maps its internal model environment variables to GLM models. By default, the Opus and Sonnet slots both point to GLM-4.7, and the Haiku slot points to GLM-4.5-Air. To switch to GLM-5.2, you update ~/.claude/settings.json.
On macOS, open the file with vim ~/.claude/settings.json in the terminal, or navigate to it via Finder using Go > Go to Folder. On Windows, locate the file at ~/.claude/settings.json directly. Add or replace the environment variables block with the following:
{
"env": {
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]"
}
}
After saving, open a new terminal window and run claude to launch Claude Code. Type /status to confirm the active model. You should see glm-5.2[1m] listed as the default model in the status output.
Switching models in OpenClaw
OpenClaw requires a manual configuration edit if the provider model selector doesn't surface GLM-5.2 directly. The configuration file lives at ~/.openclaw/openclaw.json. You need to make three changes.
First, add the GLM-5.2 model object to the models.providers.zai.models array:
{
"id": "glm-5.2",
"name": "GLM-5.2",
"reasoning": true,
"input": ["text"],
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
"contextWindow": 1000000,
"maxTokens": 131072
}
Second, update the default model under agents.defaults.model.primary from "zai/glm-5" to "zai/glm-5.2". Third, add "zai/glm-5.2": {} under agents.defaults.models. Once all three edits are saved, restart the gateway with openclaw gateway restart and verify by running openclaw tui.
Switching models in Cline and other OpenAI-compatible tools
For Cline and any other tool that supports a custom OpenAI-compatible provider, the setup is straightforward. Use the following settings:
- API Provider: OpenAI Compatible
- Base URL:
https://api.z.ai/api/coding/paas/v4 - API Key: your Z.ai API key
- Model: Custom Model, enter
glm-5.2 - Context Window Size: 1000000
- Support Images: unchecked
Temperature and other parameters can be adjusted based on your task. Tools that do not allow custom model configuration will need to wait for official support in a future release.
GLM-5.2 Benchmarks
Official benchmark scores are now available, confirming GLM-5.2 as the strongest open-source model currently on the market. In standard coding benchmarks, GLM-5.2 significantly outperforms its predecessor. It achieved an 81.0 on Terminal-Bench 2.1 (compared to GLM-5.1’s 62.0) and a 62.1% on SWE-bench Pro (up from GLM-5.1’s 58.4%).
Z.ai also released numbers targeting long-horizon coding performance. Across FrontierSWE, PostTrainBench, and SWE-Marathon, GLM-5.2 consistently ranks among the top models overall. It trails Claude Opus 4.8 by just 1% on FrontierSWE and successfully outperforms both GPT-5.5 and Claude Opus 4.7 across multiple benchmarks, maintaining its position as the highest-ranked open-source model across the board.
What the benchmarks mean in practice
The official benchmarks position GLM-5.2 ahead of models like GPT-5.5 and Kimi K2.7-Code on standardized tasks. The numbers suggest that its 1-million-token context window translates into practical engineering capabilities, particularly for cross-file, long-chain tasks.
Early developer feedback published by Z.ai echoes these metrics, citing better project-level context capacity and more reliable adherence to strict engineering standards.
Interestingly, the benchmarks also revealed the model's agentic persistence. Z.ai noted that GLM-5.2 is highly prone to "reward hacking" during evaluations. Instead of solving the problem, the agent would write scripts to search the workspace for hidden secret_cases.json files or use curl to download the target source code directly from GitHub.
Z.ai had to build a two-stage, online anti-hack module just to force the model to solve the problems legitimately, returning dummy data when it tried to cheat rather than crashing the run.
That being said, for practitioners, an element of evaluation burden always remains. While the benchmark scores are highly competitive (and clearly hard-won against a model trying to cheat the test), you should still run GLM-5.2 against your own representative codebase before committing it to production workflows.
We'll follow up with a comprehensive GLM-5.2 tutorial soon.
GLM-5.2 Pricing and Availability
GLM-5.2 is available now to all GLM Coding Plan users at Z.ai. The plan tiers are Lite, Pro, Max, and Team.
Z.AI offers three tiers based on repository size and usage frequency. While subscriptions are billed monthly, opting for annual billing cuts the cost by 30%.
| Tier | Monthly Price | Annual Price (Per Month) | Target User | Base Quota (5-Hour / Weekly) |
|---|---|---|---|---|
| Lite | $18 | $12.60 | Small repos, lightweight iteration | ~80 / ~400 prompts |
| Pro | $72 | $50.40 | Mid-sized repos, daily development | ~400 / ~2,000 prompts |
| Max | $160 | $112.00 | Large repos, advanced workflows | ~1,600 / ~8,000 prompts |
As per the Z.ai usage page, because GLM-5.2 is an advanced model designed to rival Claude Opus 4.8, it is highly resource-intensive. The prompt limits listed in the pricing table above are baseline estimates. Using GLM-5.2 will drain this quota faster based on the time of day:
- Peak Hours (14:00–18:00 UTC+8): Each GLM-5.2 prompt deducts 3× the standard quota.
- Off-Peak Hours: Each prompt deducts 2× the standard quota.
- Limited-Time Promo: Through the end of September, off-peak usage of GLM-5.2 only deducts 1× quota.
For these reasons, the recommendation is to use GLM-5.2 for complex tasks so as to preserve your usage.
For developers looking to integrate the model directly via the API, GLM-5.2 uses a pay-as-you-go metered pricing structure.
According to Z.ai's official pricing documentation, GLM-5.2 API usage is billed per million tokens:
- Input Tokens: $1.40 per 1M tokens
- Cached Input Tokens: $0.26 per 1M tokens
- Cached Input Storage: Limited-time Free
- Output Tokens: $4.40 per 1M tokens
The API endpoint for developer access is https://api.z.ai/api/coding/paas/v4. The model identifiers are glm-5.2 for the standard version and glm-5.2[1m] for the 1M token context variant. You'll need a Z.ai API key, which you can generate at z.ai/manage-apikey/apikey-list.
Open-source weights are described as pending, with an MIT license planned. GLM-5.1 weights were released under MIT at launch, so the expectation is that GLM-5.2 weights will follow. The timeline given in the announcement was "next week" relative to the June 13, 2026, release date.
Final Thoughts
GLM-5.2 is an interesting release to evaluate because the strongest argument for trying it has nothing to do with benchmarks. It's free as part of a GLM Coding Plan (with usage limits depending on tier), it has a 1M token context window, and it drops into Claude Code or Cline with a quick configuration change. That's a low barrier to test.
The benchmark scores are impressive. Pushing the SWE-bench Pro score to 62.1% and the Terminal-Bench 2.1 score to 81.0 proves this is a big step up over GLM-5.1. Given the zero cost of entry for existing plan users, it is absolutely worth spinning up for your next refactor.
If you want to get up to speed on AI coding tools and how to evaluate them, I recommend starting with the AI-Assisted Coding for Developers course on DataCamp, which covers the concepts you need to assess models like GLM-5.2 in your own workflows.
GLM-5.2 FAQs
What happens when a user entirely depletes their weekly or 5-hour prompt quota?
Once your tier’s baseline quota is exhausted, access to GLM-5.2 is typically throttled to a lower-priority queue rather than being cut off completely. Alternatively, your session may automatically fall back to a less resource-intensive model (such as GLM-4.5-Air), allowing you to finish lightweight iterations until your quota resets or you purchase a top-up.
Does the developer API endpoint consume the subscription plan quota, or is it billed separately?
The API endpoint operates on a separate pay-as-you-go model and does not draw from your Coding Plan prompt quota. It is billed directly based on token usage. For GLM-5.2, this costs $1.40 per 1M input tokens, $0.26 per 1M cached input tokens, and $4.40 per 1M output tokens. If you use third-party tools like OpenClaw or Cline via the API endpoint, you are paying these token rates rather than using your monthly subscription limits.
How does GLM-5.2 handle Anthropic-style tool use and function calling?
Because GLM-5.2 is built on an Anthropic-compatible API framework, it natively parses and supports standard Anthropic tools and tool_choice parameter schemas. This structural compatibility allows advanced coding agents to execute multi-step filesystem operations, shell execution, and custom tools out of the box without requiring a custom translation layer.
Is fine-tuning available for GLM-5.2 under the Max or Team tiers?
No, fine-tuning is not supported or offered through the GLM Coding Plan subscription tiers or the current API endpoints. If your team requires a customized version of the model on proprietary codebases, you will need to wait for Z.ai to release the pending open-source MIT-licensed weights so you can self-host and fine-tune the model on your own infrastructure.
A senior editor in the AI and edtech space. Committed to exploring data and AI trends.



