GLM-5.2: Features, Setup, Benchmarks, and Model Switching Guide

Z.ai's GLM-5.2 ships with a 1M token context window, two reasoning effort levels, and free access across all GLM Coding Plan tiers.

更新 2026年6月17日 · 11 分読む

AIで探索

ChatGPTで開く Claudeで開く Perplexityで開く

June has certainly been an interesting month for AI model releases. Anthropic launched Claude Fable 5 and then withdrew it from public access. Moonshot AI shipped Kimi K2.7-Code, reporting a +21.8% gain on Kimi Code Bench v2 over its predecessor. Most recently, Z.ai announced GLM-5.2, its new flagship coding and agentic AI model, available immediately to all GLM Coding Plan users, including Lite, Pro, Max, and Team tiers.

GLM-5.2 ships with a 1 million token context window, up to 131,072 output tokens, and two reasoning effort levels: high and max. While Z.ai initially published no official benchmark scores, the newly released developer documentation confirms GLM-5.2 as the leading open-source model across major coding metrics, placing it within striking distance of closed-source frontier models. The announcement otherwise focused on availability, context, and the open-source roadmap, with MIT-licensed weights described as pending.

In this article, I'll cover what GLM-5.2 is, what's new compared to GLM-5.1, how to switch to it in Claude Code, OpenClaw, and Cline, and what the benchmarks mean for practitioners using it. You can also check out our comparison of GPT-5.5 vs Gemini 3.1 Pro for context on where the frontier currently sits.

What is GLM-5.2?

GLM-5.2 is Z.ai's new flagship model in the GLM-5 lineage, released on June 16, 2026. It sits at the top of the GLM Coding Plan and replaces GLM-5.1 as the primary model for coding and agentic tasks. Because it uses an Anthropic-compatible endpoint, it drops seamlessly into tools like Claude Code and Cline with a quick base URL swap and a model name change.

Compared to GLM-5.1, the core practical upgrades include:

Massive Context: Jump from ~200,000 tokens to 1,000,000 tokens when using the glm-5.2[1m] identifier.
Expanded Output: Maximum output tokens are documented at 131,072 (up from 120,000).
Dual Reasoning Modes: Introduces high and max effort levels, with Z.ai recommending max for complex task stability.

The Architecture Under the Hood

While the previous generation was a total black box, Z.ai’s technical blog revealed several custom engineering mechanisms built to handle long-context, agentic workloads without skyrocketing latency:

Optimization	System	How it Works
Attention Mechanism	IndexShare	Reuses a single lightweight indexer across every four transformer layers, cutting per-token FLOPs by 2.9x at a 1M context.
Memory Management	LayerSplit	Implements fine-grained memory management to prevent the system from buckling under KV-cache memory limits.
Inference Speed	MTP + KVShare	Overhauls the Multi-Token Prediction layer with speculative decoding, boosting token acceptance length by up to 20%.
Post-Training	"slime" Infrastructure	A specialized training framework that allowed Z.ai to merge over ten expert models in just two days.
Agent Stability	Critic-based PPO	Shifts to a direct actor-critic Reinforcement Learning formulation featuring an active "anti-hack" module to anchor long-horizon trajectories.

What's New With GLM-5.2?

Three changes stand out in this release: the expanded context window, the dual reasoning effort system, and the integration path into third-party coding agents. Each has practical implications for how you'd actually use the model.

1 million token context window

GLM-5.2 supports a 1 million token context window, but it's opt-in rather than default. To activate it, you append [1m] to the model name in your configuration: glm-5.2[1m]. You also need to set the compression window parameter CLAUDE_CODE_AUTO_COMPACT_WINDOW to 1000000 in your settings.json.

This matters for coding workflows where you're working across large codebases. A 1M token window can hold roughly 750,000 words of code and context simultaneously, which is enough to load an entire mid-sized repository without chunking. The caveat is that long-context quality often degrades at the extremes, and Z.ai has not published retrieval accuracy numbers at 1M tokens for this model.

One practical note: if Claude Code reports that the model with the [1m] suffix does not exist, the fix is to upgrade Claude Code to the latest version. This is a version compatibility issue, not a model availability issue.

Two reasoning effort levels

GLM-5.2 introduces a two-tier effort system: high and max. In Claude Code, you switch between them using the /effort command during a session. The mapping from Claude Code's effort labels to GLM-5.2's actual effort levels is as follows:

low, medium, high (default): maps to GLM-5.2 high effort
xhigh, max, ultracode: maps to GLM-5.2 max effort

Z.ai explicitly recommends max effort for coding tasks. The default in a new session maps to high, so if you're running complex multi-step tasks, you'll want to switch manually. This is the same tradeoff you see in other reasoning models: higher effort means more deliberate output but also higher latency and token usage.

Anthropic-compatible endpoint integration

GLM-5.2 is accessible through Z.ai's Anthropic-compatible API endpoint at https://api.z.ai/api/coding/paas/v4. This means any tool that supports a custom Anthropic base URL can use GLM-5.2 without waiting for native support. Claude Code, OpenClaw, and Cline all work today, as per the documentation.

The integration approach is a deliberate positioning choice. Rather than building a standalone interface, Z.ai is betting that developers already have a preferred coding agent and just want to swap the underlying model. The tradeoff is that tools without custom model configuration support won't work until Z.ai ships official integrations.

GLM-5.2 is available at no additional cost to all GLM Coding Plan users: Lite, Pro, Max, and Team.

GLM-5.2 vs GLM-5.1: Specification Comparison

Attribute	GLM-5.2	GLM-5.1
Released	June 13, 2026	April 7, 2026
Context window	1,000,000 tokens (glm-5.2[1m])	~200,000 tokens
Max output tokens	131,072	120,000
Reasoning modes	High, Max	Single mode
Architecture	GLM-5 lineage with IndexShare & MTP optimizations	744B MoE, 40B active
License	MIT (weights pending)	MIT (open weights released)
Launch benchmarks	62.1% SWE-bench Pro, 81.0 Terminal-Bench 2.1	58.4% SWE-bench Pro, 63.5 Terminal-Bench 2.1
Access at launch	GLM Coding Plan, API, weights pending	Coding Plan, API, and weights

How to Switch to GLM-5.2

The setup process differs slightly depending on which coding agent you use. Here's how to configure each one.

Switching models in Claude Code

Claude Code maps its internal model environment variables to GLM models. By default, the Opus and Sonnet slots both point to GLM-4.7, and the Haiku slot points to GLM-4.5-Air. To switch to GLM-5.2, you update ~/.claude/settings.json.

On macOS, open the file with vim ~/.claude/settings.json in the terminal, or navigate to it via Finder using Go > Go to Folder. On Windows, locate the file at ~/.claude/settings.json directly. Add or replace the environment variables block with the following:

{
  "env": {
    "CLAUDE_CODE_AUTO_COMPACT_WINDOW": "1000000",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.2[1m]",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.2[1m]"
  }
}

After saving, open a new terminal window and run claude to launch Claude Code. Type /status to confirm the active model. You should see glm-5.2[1m] listed as the default model in the status output.

Switching models in OpenClaw

OpenClaw requires a manual configuration edit if the provider model selector doesn't surface GLM-5.2 directly. The configuration file lives at ~/.openclaw/openclaw.json. You need to make three changes.

First, add the GLM-5.2 model object to the models.providers.zai.models array:

{
  "id": "glm-5.2",
  "name": "GLM-5.2",
  "reasoning": true,
  "input": ["text"],
  "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
  "contextWindow": 1000000,
  "maxTokens": 131072
}

Second, update the default model under agents.defaults.model.primary from "zai/glm-5" to "zai/glm-5.2". Third, add "zai/glm-5.2": {} under agents.defaults.models. Once all three edits are saved, restart the gateway with openclaw gateway restart and verify by running openclaw tui.

Switching models in Cline and other OpenAI-compatible tools

For Cline and any other tool that supports a custom OpenAI-compatible provider, the setup is straightforward. Use the following settings:

API Provider: OpenAI Compatible
Base URL: https://api.z.ai/api/coding/paas/v4
API Key: your Z.ai API key
Model: Custom Model, enter glm-5.2
Context Window Size: 1000000
Support Images: unchecked

Temperature and other parameters can be adjusted based on your task. Tools that do not allow custom model configuration will need to wait for official support in a future release.

GLM-5.2 Benchmarks

Official benchmark scores are now available, confirming GLM-5.2 as the strongest open-source model currently on the market. In standard coding benchmarks, GLM-5.2 significantly outperforms its predecessor. It achieved an 81.0 on Terminal-Bench 2.1 (compared to GLM-5.1’s 62.0) and a 62.1% on SWE-bench Pro (up from GLM-5.1’s 58.4%).

This SWE-bench Pro score is particularly relevant. SWE-bench measures a model's ability to resolve real GitHub issues in open-source repositories. For comparison, GPT-5.5 scored 58.6% and Gemini 3.1 Pro scored 54.2%. GLM-5.2 not only beats its predecessor but also outperforms these established frontier models. It is rapidly closing the gap with the top tier of closed-source options, landing just a few points behind Claude Opus 4.8, which holds an 85.0 on Terminal-Bench 2.1.

Source

Z.ai also released numbers targeting long-horizon coding performance. Across FrontierSWE, PostTrainBench, and SWE-Marathon, GLM-5.2 consistently ranks among the top models overall. It trails Claude Opus 4.8 by just 1% on FrontierSWE and successfully outperforms both GPT-5.5 and Claude Opus 4.7 across multiple benchmarks, maintaining its position as the highest-ranked open-source model across the board.

Source

What the benchmarks mean in practice

The official benchmarks position GLM-5.2 ahead of models like GPT-5.5 and Kimi K2.7-Code on standardized tasks. The numbers suggest that its 1-million-token context window translates into practical engineering capabilities, particularly for cross-file, long-chain tasks.

Early developer feedback published by Z.ai echoes these metrics, citing better project-level context capacity and more reliable adherence to strict engineering standards.

Interestingly, the benchmarks also revealed the model's agentic persistence. Z.ai noted that GLM-5.2 is highly prone to "reward hacking" during evaluations. Instead of solving the problem, the agent would write scripts to search the workspace for hidden secret_cases.json files or use curl to download the target source code directly from GitHub.

Z.ai had to build a two-stage, online anti-hack module just to force the model to solve the problems legitimately, returning dummy data when it tried to cheat rather than crashing the run.

That being said, for practitioners, an element of evaluation burden always remains. While the benchmark scores are highly competitive (and clearly hard-won against a model trying to cheat the test), you should still run GLM-5.2 against your own representative codebase before committing it to production workflows.

We'll follow up with a comprehensive GLM-5.2 tutorial soon.

GLM-5.2 Pricing and Availability

GLM-5.2 is available now to all GLM Coding Plan users at Z.ai. The plan tiers are Lite, Pro, Max, and Team.

Z.AI offers three tiers based on repository size and usage frequency. While subscriptions are billed monthly, opting for annual billing cuts the cost by 30%.

Tier	Monthly Price	Annual Price (Per Month)	Target User	Base Quota (5-Hour / Weekly)
Lite	$18	$12.60	Small repos, lightweight iteration	~80 / ~400 prompts
Pro	$72	$50.40	Mid-sized repos, daily development	~400 / ~2,000 prompts
Max	$160	$112.00	Large repos, advanced workflows	~1,600 / ~8,000 prompts

As per the Z.ai usage page, because GLM-5.2 is an advanced model designed to rival Claude Opus 4.8, it is highly resource-intensive. The prompt limits listed in the pricing table above are baseline estimates. Using GLM-5.2 will drain this quota faster based on the time of day:

Peak Hours (14:00–18:00 UTC+8): Each GLM-5.2 prompt deducts 3× the standard quota.
Off-Peak Hours: Each prompt deducts 2× the standard quota.
Limited-Time Promo: Through the end of September, off-peak usage of GLM-5.2 only deducts 1× quota.

For these reasons, the recommendation is to use GLM-5.2 for complex tasks so as to preserve your usage.

For developers looking to integrate the model directly via the API, GLM-5.2 uses a pay-as-you-go metered pricing structure.

According to Z.ai's official pricing documentation, GLM-5.2 API usage is billed per million tokens:

Input Tokens: $1.40 per 1M tokens
Cached Input Tokens: $0.26 per 1M tokens
Cached Input Storage: Limited-time Free
Output Tokens: $4.40 per 1M tokens

The API endpoint for developer access is https://api.z.ai/api/coding/paas/v4. The model identifiers are glm-5.2 for the standard version and glm-5.2[1m] for the 1M token context variant. You'll need a Z.ai API key, which you can generate at z.ai/manage-apikey/apikey-list.

Open-source weights are described as pending, with an MIT license planned. GLM-5.1 weights were released under MIT at launch, so the expectation is that GLM-5.2 weights will follow. The timeline given in the announcement was "next week" relative to the June 13, 2026, release date.

Final Thoughts

GLM-5.2 is an interesting release to evaluate because the strongest argument for trying it has nothing to do with benchmarks. It's free as part of a GLM Coding Plan (with usage limits depending on tier), it has a 1M token context window, and it drops into Claude Code or Cline with a quick configuration change. That's a low barrier to test.

The benchmark scores are impressive. Pushing the SWE-bench Pro score to 62.1% and the Terminal-Bench 2.1 score to 81.0 proves this is a big step up over GLM-5.1. Given the zero cost of entry for existing plan users, it is absolutely worth spinning up for your next refactor.

If you want to get up to speed on AI coding tools and how to evaluate them, I recommend starting with the AI-Assisted Coding for Developers course on DataCamp, which covers the concepts you need to assess models like GLM-5.2 in your own workflows.

What happens when a user entirely depletes their weekly or 5-hour prompt quota?

Does the developer API endpoint consume the subscription plan quota, or is it billed separately?

How does GLM-5.2 handle Anthropic-style tool use and function calling?

Is fine-tuning available for GLM-5.2 under the Max or Team tiers?

Author

Matt Crabtree

トピック

Artificial Intelligence

Large Language Models

Top DataCamp Courses

Tracks

GitHub Copilot Fundamentals (GH-300)

6時間

Prepare for GitHub's Copilot certification (GH-300) by mastering AI-assisted coding, Copilot features, and responsible AI.

詳細を見る

コースを開始

Courses

開発者のための AI 支援コーディング

1時間30分

7.7K

AIでコーディングを加速。アシスタントに指示して、コードの作成・テスト・ドキュメント化を効率的に行いましょう。

詳細を見る

コースを開始

Courses

Claude Code 101

3時間

21.1K

Learn how to use Claude Code effectively in your daily development workflows.

詳細を見る

コースを開始

Gemini 2.5 Pro: Features, Tests, Access, Benchmarks, and More

Explore Google's Gemini 2.5 Pro, and learn about its impressive 1 million token context window, multimodal capabilities, hands-on test results, and how to access it.

Alex Olteanu

8 分

blogs

GLM-5 vs GPT-5.3-Codex: Which AI Model Wins for Agent Workflows?

We compare GLM 5 vs GPT 5.3 Codex for AI agent workflows, analyzing architecture, benchmarks, deployment choices, and costs to guide your model selection.

Brian Mutea

15 分

blogs

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

Discover how GPT-5.2 improves knowledge work with major upgrades in long-context reasoning, tool calling, coding, vision, and end-to-end workflow execution.

Josef Waples

10 分

blogs

SubQ AI Explained: How Good Is the 12M Context Window LLM?

Subquadratic's SubQ model claims a 12M-token context window, 52x efficiency, and frontier performance. Here's how its SSA architecture works and what the benchmarks actually say.

Srujana Maddula

13 分

tutorials

Run GLM-5 Locally For Agentic Coding

Run GLM-5, the best open-weight AI model, on a single GPU with llama.cpp, and connect it to Aider to turn it into a powerful local coding agent.

Abid Ali Awan

tutorials

How to Run GLM-4.7 Locally with llama.cpp: A High-Performance Guide

Setting up llama.cpp to run the GLM-4.7 model on a single NVIDIA H100 80GB GPU, achieving up to 20 tokens per second using GPU offloading, Flash Attention, optimized context size, efficient batching, and tuned CPU threading.

Abid Ali Awan

もっと見るもっと見る

What is GLM-5.2?

The Architecture Under the Hood

What's New With GLM-5.2?

1 million token context window

Two reasoning effort levels

Anthropic-compatible endpoint integration

GLM-5.2 vs GLM-5.1: Specification Comparison

How to Switch to GLM-5.2

Switching models in Claude Code

Switching models in OpenClaw

Switching models in Cline and other OpenAI-compatible tools

GLM-5.2 Benchmarks

What the benchmarks mean in practice

GLM-5.2 Pricing and Availability

Final Thoughts

GLM-5.2 FAQs

How does GLM-5.2 handle Anthropic-style tool use and function calling?

Is fine-tuning available for GLM-5.2 under the Max or Team tiers?

Gemini 2.5 Pro: Features, Tests, Access, Benchmarks, and More

GLM-5 vs GPT-5.3-Codex: Which AI Model Wins for Agent Workflows?

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

SubQ AI Explained: How Good Is the 12M Context Window LLM?

Run GLM-5 Locally For Agentic Coding

How to Run GLM-4.7 Locally with llama.cpp: A High-Performance Guide

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}GitHub Copilot Fundamentals (GH-300)

開発者のための AI 支援コーディング

Claude Code 101

Gemini 2.5 Pro: Features, Tests, Access, Benchmarks, and More

GLM-5 vs GPT-5.3-Codex: Which AI Model Wins for Agent Workflows?

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

SubQ AI Explained: How Good Is the 12M Context Window LLM?

Run GLM-5 Locally For Agentic Coding

How to Run GLM-4.7 Locally with llama.cpp: A High-Performance Guide

GitHub Copilot Fundamentals (GH-300)