Tracks
June has certainly been an interesting month for AI model releases. Anthropic launched Claude Fable 5 and then withdrew it from public access. Moonshot AI shipped Kimi K2.7-Code, reporting a +21.8% gain on Kimi Code Bench v2 over its predecessor. Most recently, Z.ai announced GLM-5.2, its new flagship coding and agentic AI model, available immediately to all GLM Coding Plan users, including Lite, Pro, Max, and Team tiers.
GLM-5.2 ships with a 1 million token context window, up to 131,072 output tokens, and two reasoning effort levels: high and max. Z.ai published no official benchmark scores at launch, which is worth noting upfront. The announcement focused on availability, context, and the open-source roadmap, with MIT-licensed weights described as pending.
In this article, I'll cover what GLM-5.2 is, what's new compared to GLM-5.1, how to switch to it in Claude Code, OpenClaw, and Cline, and what the absence of launch benchmarks means for practitioners evaluating it. You can also check out our comparison of GPT-5.5 vs Gemini 3.1 Pro for context on where the frontier currently sits.
What is GLM-5.2?
GLM-5.2 is Z.ai's new flagship model in the GLM-5 lineage, released on June 13, 2026. It sits at the top of the GLM Coding Plan and replaces GLM-5.1 as the recommended model for coding and agentic tasks. The model is accessed through an Anthropic-compatible endpoint, which means it drops into tools like Claude Code and Cline with a base URL swap and a model name change.
Compared to GLM-5.1, the headline upgrade is context length. GLM-5.1 offered roughly 200,000 tokens of context; GLM-5.2 extends that to 1,000,000 tokens when you use the glm-5.2[1m] model identifier. The maximum output tokens are now explicitly documented at 131,072, up from GLM-5.1's 128,000.
The other structural change is reasoning modes. GLM-5.1 had a single reasoning mode. GLM-5.2 introduces two: high and max. Z.ai's documentation recommends max effort for coding tasks, describing it as better for deep reasoning and complex task stability. The architecture is listed as "not specified at launch (GLM-5 lineage)," so there's no public parameter count yet, unlike GLM-5.1, which was documented as a 744B MoE model with 40B active parameters per token.
What's New With GLM-5.2?
Three changes stand out in this release: the expanded context window, the dual reasoning effort system, and the integration path into third-party coding agents. Each has practical implications for how you'd actually use the model.
1 million token context window
GLM-5.2 supports a 1 million token context window, but it's opt-in rather than default. To activate it, you append [1m] to the model name in your configuration: glm-5.2[1m]. You also need to set the compression window parameter CLAUDE_CODE_AUTO_COMPACT_WINDOW to 1000000 in your settings.json.
This matters for coding workflows where you're working across large codebases. A 1M token window can hold roughly 750,000 words of code and context simultaneously, which is enough to load an entire mid-sized repository without chunking. The caveat is that long-context quality often degrades at the extremes, and Z.ai has not published retrieval accuracy numbers at 1M tokens for this model.
One practical note: if Claude Code reports that the model with the [1m] suffix does not exist, the fix is to upgrade Claude Code to the latest version. This is a version compatibility issue, not a model availability issue.
Two reasoning effort levels
GLM-5.2 introduces a two-tier effort system: high and max. In Claude Code, you switch between them using the /effort command during a session. The mapping from Claude Code's effort labels to GLM-5.2's actual effort levels is as follows:
- low, medium, high (default): maps to GLM-5.2 high effort
- xhigh, max, ultracode: maps to GLM-5.2 max effort
Z.ai explicitly recommends max effort for coding tasks. The default in a new session maps to high, so if you're running complex multi-step tasks, you'll want to switch manually. This is the same tradeoff you see in other reasoning models: higher effort means more deliberate output but also higher latency and token usage.
Anthropic-compatible endpoint integration
GLM-5.2 is accessible through Z.ai's Anthropic-compatible API endpoint at https://api.z.ai/api/coding/paas/v4. This means any tool that supports a custom Anthropic base URL can use GLM-5.2 without waiting for native support. Claude Code, OpenClaw, and Cline all work today, as per the documentation.
The integration approach is a deliberate positioning choice. Rather than building a standalone interface, Z.ai is betting that developers already have a preferred coding agent and just want to swap the underlying model. The tradeoff is that tools without custom model configuration support won't work until Z.ai ships official integrations.
GLM-5.2 is available at no additional cost to all GLM Coding Plan users: Lite, Pro, Max, and Team.
GLM-5.2 vs GLM-5.1: Specification Comparison
| Attribute | GLM-5.2 | GLM-5.1 |
|---|---|---|
| Released | June 13, 2026 | April 7, 2026 |
| Context window | 1,000,000 tokens (glm-5.2[1m]) | ~200,000 tokens |
| Max output tokens | 131,072 | 120,000 |
| Reasoning modes | High, Max | Single mode |
| Architecture | Not specified at launch (GLM-5 lineage) | 744B MoE, 40B active |
| License | MIT (weights pending) | MIT (open weights released) |
| Launch benchmarks | None published | 58.4% SWE-bench Pro |
| Access at launch | GLM Coding Plan (all tiers) | Coding Plan, API, and weights |
How to Switch to GLM-5.2
The setup process differs slightly depending on which coding agent you use. Here's how to configure each one.
Switching models in Claude Code
Claude Code maps its internal model environment variables to GLM models. By default, the Opus and Sonnet slots both point to GLM-4.7, and the Haiku slot points to GLM-4.5-Air. To switch to GLM-5.2, you update ~/.claude/settings.json.
On macOS, open the file with vim ~/.claude/settings.json in the terminal, or navigate to it via Finder using Go > Go to Folder. On Windows, locate the file at ~/.claude/settings.json directly. Add or replace the environment variables block with the following:
{
"env": {
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]"
}
}
After saving, open a new terminal window and run claude to launch Claude Code. Type /status to confirm the active model. You should see glm-5.2[1m] listed as the default model in the status output.
Switching models in OpenClaw
OpenClaw requires a manual configuration edit if the provider model selector doesn't surface GLM-5.2 directly. The configuration file lives at ~/.openclaw/openclaw.json. You need to make three changes.
First, add the GLM-5.2 model object to the models.providers.zai.models array:
{
"id": "glm-5.2",
"name": "GLM-5.2",
"reasoning": true,
"input": ["text"],
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
"contextWindow": 1000000,
"maxTokens": 131072
}
Second, update the default model under agents.defaults.model.primary from "zai/glm-5" to "zai/glm-5.2". Third, add "zai/glm-5.2": {} under agents.defaults.models. Once all three edits are saved, restart the gateway with openclaw gateway restart and verify by running openclaw tui.
Switching models in Cline and other OpenAI-compatible tools
For Cline and any other tool that supports a custom OpenAI-compatible provider, the setup is straightforward. Use the following settings:
- API Provider: OpenAI Compatible
- Base URL:
https://api.z.ai/api/coding/paas/v4 - API Key: your Z.ai API key
- Model: Custom Model, enter
glm-5.2 - Context Window Size: 1000000
- Support Images: unchecked
Temperature and other parameters can be adjusted based on your task. Tools that do not allow custom model configuration will need to wait for official support in a future release.
GLM-5.2 Benchmarks
Z.ai published no official benchmark scores for GLM-5.2 at launch, although if we look closely at the subscription page, we can see that they claim GLM-5.2 ranks #1 open-source and #3 overall on FrontierSWE, nearing Claude Opus 4.8.
There is no SWE-bench, Terminal-Bench, or Code Arena number available that I could find. This is a notable contrast to GLM-5.1, which launched with a 58.4% SWE-bench Pro score. The announcement focused on availability, context length, and the open-source roadmap rather than evaluation results.
That being said, if we look at the LLM Benchmark Dashboard for Code v3, we can see that GLM-5.2(max) appears in the top 3, behind only GPT-5.5 and Claude Opus 4.8. So, still an impressive set of results at this early stage for an open-source model.
GLM-5.1 SWE-bench Pro baseline
The only official benchmark data in the GLM-5 lineage comes from GLM-5.1, which scored 58.4% on SWE-bench Pro at its April 7, 2026 launch. SWE-bench Pro measures a model's ability to resolve real GitHub issues in open-source Python repositories, which makes it one of the more practically relevant coding benchmarks available.
For comparison, GPT-5.5 scored 58.6% on SWE-bench Pro, and Gemini 3.1 Pro scored 54.2%, based on our coverage of those models. GLM-5.1 was therefore competitive with frontier models on this specific benchmark. Whether GLM-5.2 improves on that number is unknown until independent testing is published. I will update this article once I have more info.
No launch benchmarks: what it means in practice
The absence of official benchmark scores at the announcement means there's not yet an independent way to position GLM-5.2 against the likes of Claude Fable 5, GPT-5.5, or Kimi K2.7-Code on standardized tasks. Z.ai's stated reason for the launch was availability and the open-source roadmap, not performance claims.
For practitioners, this means the evaluation burden falls on you. The hands-on tests above are designed to give you a starting point, but you should run GLM-5.2 against your own representative tasks before committing to it for production workflows. The 1M context window makes it worth testing; the lack of benchmarks means you shouldn't assume it outperforms other models on coding tasks just yet. We'll update with more info and a GLM-5.2 tutorial soon.
GLM-5.2 Pricing and Availability
GLM-5.2 is available now to all GLM Coding Plan users at Z.ai. The plan tiers are Lite, Pro, Max, and Team.
Z.AI offers three tiers based on repository size and usage frequency. While subscriptions are billed monthly, opting for annual billing cuts the cost by 30%.
| Tier | Monthly Price | Annual Price (Per Month) | Target User | Base Quota (5-Hour / Weekly) |
|---|---|---|---|---|
| Lite | $18 | $12.60 | Small repos, lightweight iteration | ~80 / ~400 prompts |
| Pro | $72 | $50.40 | Mid-sized repos, daily development | ~400 / ~2,000 prompts |
| Max | $160 | $112.00 | Large repos, advanced workflows | ~1,600 / ~8,000 prompts |
As per the Z.ai usage page, because GLM-5.2 is an advanced model designed to rival Claude Opus 4.8, it is highly resource-intensive. The prompt limits listed in the pricing table above are baseline estimates. Using GLM-5.2 will drain this quota faster based on the time of day:
- Peak Hours (14:00–18:00 UTC+8): Each GLM-5.2 prompt deducts 3× the standard quota.
- Off-Peak Hours: Each prompt deducts 2× the standard quota.
- Limited-Time Promo: Through the end of September, off-peak usage of GLM-5.2 only deducts 1× quota.
For these reasons, the recommendation is to use GLM-5.2 for complex tasks so as to preserve your usage.
For developers looking to integrate the model directly via the API, GLM-5.2 uses a pay-as-you-go metered pricing structure.
According to Z.ai's official pricing documentation, GLM-5.2 API usage is billed per million tokens:
- Input Tokens: $1.40 per 1M tokens
- Cached Input Tokens: $0.26 per 1M tokens
- Cached Input Storage: Limited-time Free
- Output Tokens: $4.40 per 1M tokens
The API endpoint for developer access is https://api.z.ai/api/coding/paas/v4. The model identifiers are glm-5.2 for the standard version and glm-5.2[1m] for the 1M token context variant. You'll need a Z.ai API key, which you can generate at z.ai/manage-apikey/apikey-list.
Open-source weights are described as pending, with an MIT license planned. GLM-5.1 weights were released under MIT at launch, so the expectation is that GLM-5.2 weights will follow. The timeline given in the announcement was "next week" relative to the June 13, 2026, release date.
Final Thoughts
GLM-5.2 is an interesting release to evaluate because the strongest argument for trying it has nothing to do with benchmarks. It's free as part of a GLM Coding Plan (with usage limits depending on tier), it has a 1M token context window, and it drops into Claude Code or Cline with a quick configuration change. That's a low barrier to test.
The missing benchmarks are a real gap. GLM-5.1 launched with a 58.4% SWE-bench Pro score that put it in some esteemed company. Without equivalent numbers for GLM-5.2, there's no way to know whether this is an improvement or a lateral move. Z.ai is betting that developers will run their own evaluations, which is a reasonable bet given the zero cost of entry.
What I find most interesting is the integration strategy. By building on an Anthropic-compatible endpoint, Z.ai is positioning GLM-5.2 as a drop-in alternative inside the tools developers already use, rather than asking them to adopt a new interface. That's a pragmatic approach for a model that needs to earn trust through usage. If the weights land under MIT as promised, it becomes a more compelling option for teams that want to self-host.
If you want to get up to speed on AI coding tools and how to evaluate them, I recommend starting with the AI-Assisted Coding for Developers course on DataCamp, which covers the concepts you need to assess models like GLM-5.2 in your own workflows.
GLM-5.2 FAQs
What happens when a user entirely depletes their weekly or 5-hour prompt quota?
Once your tier’s baseline quota is exhausted, access to GLM-5.2 is typically throttled to a lower-priority queue rather than being cut off completely. Alternatively, your session may automatically fall back to a less resource-intensive model (such as GLM-4.5-Air), allowing you to finish lightweight iterations until your quota resets or you purchase a top-up.
Does the developer API endpoint consume the subscription plan quota, or is it billed separately?
The API endpoint operates on a separate pay-as-you-go model and does not draw from your Coding Plan prompt quota. It is billed directly based on token usage. For GLM-5.2, this costs $1.40 per 1M input tokens, $0.26 per 1M cached input tokens, and $4.40 per 1M output tokens. If you use third-party tools like OpenClaw or Cline via the API endpoint, you are paying these token rates rather than using your monthly subscription limits.
How does GLM-5.2 handle Anthropic-style tool use and function calling?
Because GLM-5.2 is built on an Anthropic-compatible API framework, it natively parses and supports standard Anthropic tools and tool_choice parameter schemas. This structural compatibility allows advanced coding agents to execute multi-step filesystem operations, shell execution, and custom tools out of the box without requiring a custom translation layer.
Is fine-tuning available for GLM-5.2 under the Max or Team tiers?
No, fine-tuning is not supported or offered through the GLM Coding Plan subscription tiers or the current API endpoints. If your team requires a customized version of the model on proprietary codebases, you will need to wait for Z.ai to release the pending open-source MIT-licensed weights so you can self-host and fine-tune the model on your own infrastructure.
A senior editor in the AI and edtech space. Committed to exploring data and AI trends.



