GPT-5.4 (xhigh) vs. DeepSeek V3.2 (Reasoning)
Comparison between GPT-5.4 (xhigh) and DeepSeek V3.2 (Reasoning) across intelligence, price, speed, context window and more.
For details relating to our methodology, see our Methodology page.
Highlights
Model Comparison
| Metric | Analysis | ||
|---|---|---|---|
| Creator | |||
| Context Window | 1050k tokens (~1575 A4 pages of size 12 Arial font) | 128k tokens (~192 A4 pages of size 12 Arial font) | GPT-5.4 (xhigh) is larger than DeepSeek V3.2 (Reasoning) |
| Release Date | March, 2026 | December, 2025 | GPT-5.4 (xhigh) has a more recent release date than DeepSeek V3.2 (Reasoning) |
| Image Input Support | Yes | No | GPT-5.4 (xhigh) has image input support while DeepSeek V3.2 (Reasoning) does not |
| Open Source (Weights) | No | DeepSeek V3.2 (Reasoning) is open source while GPT-5.4 (xhigh) is proprietary |
IntelligenceUpdated
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index by Open Weights / Proprietary
Intelligence Evaluations
Agentic real-world work tasks, (Elo-500)/2000
Agentic coding & terminal use
Agentic tool use
Long context reasoning
Knowledge
1 - hallucination rate
Reasoning & knowledge
Scientific reasoning
Coding
Instruction following
Physics reasoning
Long-horizon agentic tasks
Kubernetes incident root-cause analysis
Visual reasoning
Openness
Artificial Analysis Openness Index: Score
Intelligence Index Comparisons
Intelligence vs. Cost per Intelligence Index Task
Token UseUpdated
Output Tokens per Intelligence Index Task
Price and CostUpdated
Cost per Intelligence Index Task
Cost to Run Artificial Analysis Intelligence Index
Pricing: Cache Hit, Input, and Output
Context Window
Context Window
SpeedUpdated
Measured by Output Speed (tokens per second)
Output Speed
Time per Intelligence Index Task
Latency
Measured by Time (seconds) to First Token
Latency: Time To First Answer Token
End-to-End Response Time
Seconds to output 500 tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed
End-to-End Response Time
Model Size (Open Weights Models Only)
Model Size: Total and Active Parameters
Frequently Asked Questions
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently leads the Artificial Analysis Intelligence Index with a score of 60, out of 43 models evaluated.
The top AI models by Intelligence Index are: 1. Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (60), 2. Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (56), 3. GPT-5.5 (xhigh) (55), 4. Claude Opus 4.7 (Adaptive Reasoning, Max Effort) (54), 5. GPT-5.5 (high) (53).
Mercury 2 is the fastest at 742.1 tokens per second, followed by LFM2 1.2B (503.9 t/s) and LFM2.5-1.2B-Instruct (486.9 t/s).
Qwen3.5 0.8B (Non-reasoning) is the most affordable at $0.01 per 1M tokens (blended), followed by Qwen3.5 0.8B (Reasoning) ($0.01) and Gemma 3n E4B Instruct ($0.02).
North Mini Code has the lowest time to first token at 0.30s, followed by Command A+ (0.39s) and Gemini 2.5 Flash-Lite (Non-reasoning) (0.39s).
MiniMax-M3 is the highest-ranked open weights model with an Intelligence Index score of 44. There are 22 open weights models out of 43 total evaluated.
The top open weights AI models by Intelligence Index are: 1. MiniMax-M3 (44), 2. DeepSeek V4 Pro (Reasoning, Max Effort) (44), 3. Kimi K2.6 (43).
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) leads among 39 reasoning models with an Intelligence Index score of 60. Reasoning models use extended thinking to work through complex problems before providing answers.
Models are compared across multiple dimensions including intelligence (quality), pricing, output speed (tokens per second), latency (time to first token), end-to-end response time, and context window size. Performance metrics are measured directly using standardized prompts across 538 models.
Click on any model name or row in the charts to view its dedicated page with detailed metrics and direct comparisons against similar models. You can also use the model selector to customize which models appear in each chart. View the leaderboard