Comparisons of Large Open Source AI Models (>150B)
Open source AI models with over 150B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Kimi K2.6 and Highlights
Openness
Artificial Analysis Openness Index: Results
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA
Agentic real-world work tasks, (Elo-500)/2000
Terminal-Bench Hard
Agentic coding & terminal use
𝜏²-Bench Telecom
Agentic tool use
AA-LCR
Long context reasoning
AA-Omniscience Accuracy
Knowledge
AA-Omniscience Non-Hallucination Rate
1 - hallucination rate
Humanity's Last Exam
Reasoning & knowledge
GPQA Diamond
Scientific reasoning
SciCode
Coding
IFBench
Instruction following
CritPt
Physics reasoning
APEX-Agents-AA
Long-horizon agentic tasks
ITBench-AA
Kubernetes incident root-cause analysis
MMMU-Pro
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Size
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon
Intelligence vs. Active Parameters
Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
DeepSeek
Kimi
MiniMax
NVIDIA
StepFun
Tencent
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
DeepSeek
Kimi
MiniMax
NVIDIA
StepFun
Tencent
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.
Context Window
Context Window
Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon
Further details
Weights | Provider Benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Kimi K2.6 | 54 | 1.0KB 32B active at inference time | 256k | $0.7 | 46 | +12 | |||
MiMo-V2.5-Pro | 54 | 1.0KB 42B active at inference time | 1.00M | $0.2 | 43 | ||||
DeepSeek V4 Pro (Reasoning, Max Effort) | 52 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 62 | +8 | |||
GLM-5.1 (Reasoning) | 51 | 744B 40B active at inference time | 200k | $0.9 | 78 | +9 | |||
DeepSeek V4 Pro (Reasoning, High Effort) | 50 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 61 | +8 | |||
MiniMax-M2.7 | 50 | 230B 10B active at inference time | 205k | $0.2 | 111 | +3 | |||
MiMo-V2.5 | 49 | 310B 15B active at inference time | 1.00M | $0.1 | 76 | +2 | |||
Nemotron 3 Ultra 550B A55B (Reasoning) | 48 | 550B 55B active at inference time | 262k | $0.5 | 159 | Not available | +4 | ||
DeepSeek V4 Flash (Reasoning, Max Effort) | 47 | 284B 13B active at inference time | 1.00M | $0.1 | 116 | +4 | |||
DeepSeek V4 Flash (Reasoning, High Effort) | 46 | 284B 13B active at inference time | 1.00M | $0.1 | - | +4 | |||
Qwen3.5 397B A17B (Reasoning) | 45 | 397B 17B active at inference time | 262k | $0.9 | 52 | +9 | |||
GLM-5.1 (Non-reasoning) | 44 | 744B 40B active at inference time | 200k | $0.9 | 82 | +5 | |||
Kimi K2.6 (Non-reasoning) | 43 | 1.0KB 32B active at inference time | 256k | $0.7 | 44 | +9 | |||
Step 3.7 Flash | 43 | 198B 11B active at inference time | 256k | $0.2 | 116 | ||||
Hy3-preview (Reasoning) | 42 | 295B 21B active at inference time | 256k | $0.1 | 99 | ||||
MiMo-V2-Flash (Feb 2026) | 41 | 309B 15B active at inference time | 256k | $0.1 | 143 | ||||
Qwen3.5 397B A17B (Non-reasoning) | 40 | 397B 17B active at inference time | 262k | $0.9 | 53 | +6 | |||
DeepSeek V4 Pro (Non-reasoning) | 39 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 61 | +2 | |||
Ring-2.6-1T | 38 | 1.0KB 63B active at inference time | 262k | $0.5 | 128 | ||||
Command A+ | 37 | 218B 25B active at inference time | 192k | - | 202 | ||||
DeepSeek V4 Flash (Non-reasoning) | 36 | 284B 13B active at inference time | 1.00M | $0.1 | 113 | ||||
MiMo-V2.5-Pro (Non-reasoning) | 36 | 1.0KB 41.7B active at inference time | 1.00M | $0.6 | 55 | ||||
Hy3-preview (Non-reasoning) | 34 | 295B 21B active at inference time | 256k | $0.1 | 95 | ||||
Ling-2.6-1T | 34 | 1.0KB 63B active at inference time | 262k | $0.5 | - | ||||
K-EXAONE (Reasoning) | 32 | 236B 23B active at inference time | 256k | - | - | - | |||
Trinity Large Thinking | 32 | 399B 13B active at inference time | 512k | $0.2 | 198 | ||||
MiMo-V2-Flash (Non-reasoning) | 30 | 309B 15B active at inference time | 256k | $0.1 | 142 | ||||
K-EXAONE (Non-reasoning) | 23 | 236B 23B active at inference time | 256k | - | - | - | |||
Mistral Large 3 | 23 | 675B 41B active at inference time | 256k | $0.6 | 52 | ||||
Hermes 4 - Llama-3.1 405B (Reasoning) | 19 | 406B | 128k | $1.2 | 40 | ||||
Llama 4 Maverick | 18 | 402B 17B active at inference time | 1.00M | $0.3 | 111 | +6 | |||
Hermes 4 - Llama-3.1 405B (Non-reasoning) | 18 | 406B | 128k | $1.2 | 40 | ||||
Llama 3.1 Instruct 405B | 17 | 405B | 128k | $3.1 | 61 | ||||
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 15 | 253B | 128k | $0.7 | 52 | ||||
ERNIE 4.5 300B A47B | 15 | 300B 47B active at inference time | 131k | $0.4 | - | ||||
R1 1776 | 12 | 671B 37B active at inference time | 128k | - | - | - | |||
Jamba 1.7 Large | 11 | 398B 94B active at inference time | 256k | $2.6 | 60 | ||||
Cogito v2.1 (Reasoning) | - | 671B 37B active at inference time | 128k | $1.3 | 69 |