Comparisons of Large Open Source AI Models (>150B)

Open source AI models with over 150B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Kimi logoKimi K2.6 and Xiaomi logoMiMo-V2.5-Pro are the highest intelligence Large open source models, defined as those with >150B parameters, followed by DeepSeek logoDeepSeek V4 Pro (Max) & Z AI logoGLM-5.1.

Highlights

Artificial Analysis Openness Index · Higher is better
Artificial Analysis Intelligence Index · Higher is better
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA

Agentic real-world work tasks, (Elo-500)/2000

Terminal-Bench Hard

Agentic coding & terminal use

𝜏²-Bench Telecom

Agentic tool use

AA-LCR

Long context reasoning

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

SciCode

Coding

IFBench

Instruction following

CritPt

Physics reasoning

APEX-Agents-AA

Long-horizon agentic tasks

ITBench-AA

Kubernetes incident root-cause analysis

MMMU-Pro

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
DeepSeek
Kimi
MiniMax
NVIDIA
StepFun
Tencent
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
DeepSeek
Kimi
MiniMax
NVIDIA
StepFun
Tencent
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

Weights
Provider Benchmarks
Kimi K2.6
Kimi logoKimi
54
1.0KB
32B active at inference time
256k
$0.7
46
MakoraGMICloudflare
+12
MiMo-V2.5-Pro
Xiaomi logoXiaomi
54
1.0KB
42B active at inference time
1.00M
$0.2
43
DeepInfraGMINovitaXiaomi
DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek logoDeepSeek
52
1.6KB
49B active at inference time
1.00M
$0.2
62
NovitaSiliconFlowDeepSeek
+8
GLM-5.1 (Reasoning)
Z AI logoZ AI
51
744B
40B active at inference time
200k
$0.9
78
ParasailSiliconFlowFireworks
+9
DeepSeek V4 Pro (Reasoning, High Effort)
DeepSeek logoDeepSeek
50
1.6KB
49B active at inference time
1.00M
$0.2
61
MakoraMicrosoft AzureNebius
+8
MiniMax-M2.7
MiniMax logoMiniMax
50
230B
10B active at inference time
205k
$0.2
111
NovitaFireworksTogether.ai
+3
MiMo-V2.5
Xiaomi logoXiaomi
49
310B
15B active at inference time
1.00M
$0.1
76
NovitaParasailDeepInfra
+2
Nemotron 3 Ultra 550B A55B (Reasoning)
NVIDIA logoNVIDIA
48
550B
55B active at inference time
262k
$0.5
159
Not available
GMITogether.aiNebius
+4
DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek logoDeepSeek
47
284B
13B active at inference time
1.00M
$0.1
116
NovitaMakoraGMI
+4
DeepSeek V4 Flash (Reasoning, High Effort)
DeepSeek logoDeepSeek
46
284B
13B active at inference time
1.00M
$0.1
-
ParasailMakoraNovita
+4
Qwen3.5 397B A17B (Reasoning)
Alibaba logoAlibaba
45
397B
17B active at inference time
262k
$0.9
52
WaferAlibaba CloudNebius
+9
GLM-5.1 (Non-reasoning)
Z AI logoZ AI
44
744B
40B active at inference time
200k
$0.9
82
ParasailDeepInfraNebius
+5
Kimi K2.6 (Non-reasoning)
Kimi logoKimi
43
1.0KB
32B active at inference time
256k
$0.7
44
MakoraKimiDeepInfra
+9
Step 3.7 Flash
StepFun logoStepFun
43
198B
11B active at inference time
256k
$0.2
116
StepFun
Hy3-preview (Reasoning)
Tencent logoTencent
42
295B
21B active at inference time
256k
$0.1
99
GMISiliconFlow
MiMo-V2-Flash (Feb 2026)
Xiaomi logoXiaomi
41
309B
15B active at inference time
256k
$0.1
143
Xiaomi
Qwen3.5 397B A17B (Non-reasoning)
Alibaba logoAlibaba
40
397B
17B active at inference time
262k
$0.9
53
WaferEigen AIDigitalOcean
+6
DeepSeek V4 Pro (Non-reasoning)
DeepSeek logoDeepSeek
39
1.6KB
49B active at inference time
1.00M
$0.2
61
DeepSeekMakoraNebius
+2
Ring-2.6-1T
InclusionAI logoInclusionAI
38
1.0KB
63B active at inference time
262k
$0.5
128
InclusionAI
Command A+
Cohere logoCohere
37
218B
25B active at inference time
192k
-
202
Cohere
DeepSeek V4 Flash (Non-reasoning)
DeepSeek logoDeepSeek
36
284B
13B active at inference time
1.00M
$0.1
113
GMICoreWeaveMakoraDeepSeek
MiMo-V2.5-Pro (Non-reasoning)
Xiaomi logoXiaomi
36
1.0KB
41.7B active at inference time
1.00M
$0.6
55
DeepInfraGMINovitaXiaomi
Hy3-preview (Non-reasoning)
Tencent logoTencent
34
295B
21B active at inference time
256k
$0.1
95
SiliconFlowGMI
Ling-2.6-1T
InclusionAI logoInclusionAI
34
1.0KB
63B active at inference time
262k
$0.5
-
InclusionAI
K-EXAONE (Reasoning)
LG AI Research logoLG AI Research
32
236B
23B active at inference time
256k
-
-
-
Trinity Large Thinking
Arcee AI logoArcee AI
32
399B
13B active at inference time
512k
$0.2
198
ParasailArcee AI
MiMo-V2-Flash (Non-reasoning)
Xiaomi logoXiaomi
30
309B
15B active at inference time
256k
$0.1
142
Xiaomi
K-EXAONE (Non-reasoning)
LG AI Research logoLG AI Research
23
236B
23B active at inference time
256k
-
-
-
Mistral Large 3
Mistral logoMistral
23
675B
41B active at inference time
256k
$0.6
52
Microsoft AzureMistralAmazon Bedrock
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research logoNous Research
19
406B
128k
$1.2
40
Nebius
Llama 4 Maverick
Meta logoMeta
18
402B
17B active at inference time
1.00M
$0.3
111
Microsoft AzureParasailTogether.ai
+6
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research logoNous Research
18
406B
128k
$1.2
40
Nebius
Llama 3.1 Instruct 405B
Meta logoMeta
17
405B
128k
$3.1
61
Microsoft AzureAmazon BedrockDatabricksAmazon Bedrock
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA logoNVIDIA
15
253B
128k
$0.7
52
Nebius
ERNIE 4.5 300B A47B
Baidu logoBaidu
15
300B
47B active at inference time
131k
$0.4
-
NovitaSiliconFlow
R1 1776
Perplexity logoPerplexity
12
671B
37B active at inference time
128k
-
-
-
Jamba 1.7 Large
AI21 Labs logoAI21 Labs
11
398B
94B active at inference time
256k
$2.6
60
AI21 Labs
Cogito v2.1 (Reasoning)
Deep Cogito logoDeep Cogito
-
671B
37B active at inference time
128k
$1.3
69
Together.ai