GLM-5.1 (Reasoning) Intelligence, Performance & Price Analysis
Model summary
Intelligence
Speed
Input Price
USD per 1M tokens
Cache: $0.26 (-81%)
Output Price
Verbosity
GLM-5.1 (Reasoning) is amongst the leading models in intelligence, but particularly expensive when comparing to other open weight models of similar size. The model supports text input, outputs text, and has a 200k tokens context window.
GLM-5.1 (Reasoning) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among comparable models (averaging 30).
Pricing for GLM-5.1 (Reasoning) is $1.40 per 1M input tokens (expensive, average: $0.45) and $4.40 per 1M output tokens (expensive, average: $1.69).
At 59 tokens per second, GLM-5.1 (Reasoning) is faster than average (58).
| Reasoning | Yes This page shows the reasoning version of this model. A non-reasoning variant may also exist. |
|---|---|
| Input modality | Supports: text |
| Output modality | Supports: text |
| Context window | 200k ~300 A4 pages of size 12 Arial font |
| Total parameters | 744B |
| Active parameters | 40B Number of parameters active per token during inference |
| License | Mit |
| Model weights | Hugging Face |
Metrics are compared against models of the same class:
- Non-reasoning models → compared only with other non-reasoning models
- Reasoning models → compared across both reasoning and non-reasoning
- Open weights models → compared only with other open weights models of the same size class:
- Tiny: ≤4B parameters
- Small: 4B–40B parameters
- Medium: 40B–150B parameters
- Large: >150B parameters
- Proprietary models → compared across proprietary and open weights models of the same price range, using a blended 3:1 input/output price ratio:
- <$0.15 per 1M tokens
- $0.15–$1 per 1M tokens
- >$1 per 1M tokens
Highlights
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index by Open Weights / Proprietary
Intelligence Evaluations
Agentic real-world work tasks, (ELO-500)/2000
Agentic coding & terminal use
Agentic tool use
Long context reasoning
Knowledge
1 - hallucination rate
Reasoning & knowledge
Scientific reasoning
Coding
Instruction following
Physics reasoning
Long-horizon agentic tasks
Visual reasoning
Openness
Artificial Analysis Openness Index: Results
Intelligence Index Comparisons
Intelligence vs. Price
Intelligence Index Token Use & Cost
Output Tokens Used to Run Artificial Analysis Intelligence Index
Cost to Run Artificial Analysis Intelligence Index
Context Window
Context Window
Pricing
Pricing now includes a “Cache Hit Price” alongside Input and Output pricing, with new blend ratios.
Pricing: Cache Hit, Input, and Output
Speed
Measured by Output Speed (tokens per second)
Output Speed
Output Speed vs. Price
Latency
Measured by Time (seconds) to First Token
Latency: Time To First Answer Token
End-to-End Response Time
Seconds to output 500 tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed
End-to-End Response Time
Model Size (Open Weights Models Only)
Model Size: Total and Active Parameters
Frequently Asked Questions
Common questions about GLM-5.1 (Reasoning)
GLM-5.1 (Reasoning) was released on April 7, 2026.
GLM-5.1 (Reasoning) was created by Z AI.
GLM-5.1 (Reasoning) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among other open weight models of similar size (median: 30).
GLM-5.1 (Reasoning) generates output at 59.4 tokens per second (based on the median across providers serving the model), which is above average compared to other open weight models of similar size (median: 57.7 t/s).
GLM-5.1 (Reasoning) has a time to first token (TTFT) of 1.42s (based on the median across providers serving the model), which is very competitive compared to other open weight models of similar size (median: 2.42s).
GLM-5.1 (Reasoning) costs $1.40 per 1M input tokens (at the higher end, median: $0.59) and $4.40 per 1M output tokens (at the higher end, median: $2.20), based on the median across providers serving the model.
GLM-5.1 (Reasoning) costs $1.40 per 1M input tokens and $4.40 per 1M output tokens (based on the median across providers serving the model). For a blended rate (3:1 input to output ratio), this is $2.15 per 1M tokens. Pricing may vary by provider. Compare provider pricing
Yes, GLM-5.1 (Reasoning) is a reasoning model. It uses extended thinking or chain-of-thought reasoning to work through complex problems before providing an answer.
GLM-5.1 (Reasoning) supports text input.
GLM-5.1 (Reasoning) supports text output.
No, GLM-5.1 (Reasoning) does not support image input. It can only process text.
No, GLM-5.1 (Reasoning) is not multimodal. It only supports text input.
GLM-5.1 (Reasoning) has a context window of 200k tokens. This determines how much text and conversation history the model can process in a single request.
Yes, GLM-5.1 (Reasoning) is open weights. The model weights are publicly available and can be downloaded for self-hosting.
GLM-5.1 (Reasoning) has 744 billion parameters (40 billion active).
GLM-5.1 (Reasoning) is a Mixture of Experts (MoE) model with 744 billion total parameters, but only 40 billion active parameters are used during inference.
GLM-5.1 (Reasoning) is released under the Mit license. This license allows commercial use. View license
GLM-5.1 (Reasoning) achieves a score of 51 on the Artificial Analysis Intelligence Index. This composite benchmark evaluates models across reasoning, knowledge, mathematics, and coding.
GLM-5.1 (Reasoning) is an open weights model that can be self-hosted. View providers
GLM-5.1 (Reasoning) is an open weights model that can be downloaded and self-hosted. Compare providers