Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Large Open Source AI Models (>150B)

Open source AI models with over 150B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Kimi logoKimi K2.5 and Z AI logoGLM-4.7 are the highest intelligence Large open source models, defined as those with >150B parameters, followed by DeepSeek logoDeepSeek V3.2 & Kimi logoKimi K2 Thinking.

Highlights

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Estimate (independent evaluation forthcoming)
Total Parameters
Trainable parameters in billions

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
Results claimed by AI Lab (not yet independently verified)
GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
AA-Omniscience Accuracy (Knowledge)
AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)
Humanity's Last Exam (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
SciCode (Coding)
IFBench (Instruction Following)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Active Parameters
Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
DeepSeek
Kimi
LG AI Research
Meta
MiniMax
Mistral
Xiaomi
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
Most attractive quadrant
Alibaba
DeepSeek
Kimi
LG AI Research
Meta
MiniMax
Mistral
Xiaomi
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context Window: Tokens Limit; Higher is better

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context Window: Tokens Limit; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Further details
WeightsProvider
Benchmarks
Kimi logo
Kimi K2.5 (Reasoning)
Kimi
47
1.0KB
(32B active at inference time)
256k
$1.2
112
🤗
Fireworks
Together.ai
GMI
+5 more
View
Z AI logo
GLM-4.7 (Reasoning)
Z AI
42
357B
(32B active at inference time)
200k
$0.9
127
🤗
Novita
Cerebras
Fireworks
+8 more
View
DeepSeek logo
DeepSeek V3.2 (Reasoning)
DeepSeek
42
685B
(37B active at inference time)
128k
$0.3
30
🤗
SambaNova
Fireworks
Parasail
+5 more
View
Kimi logo
Kimi K2 Thinking
Kimi
41
1.0KB
(32B active at inference time)
256k
$1.1
89
🤗
Novita
Microsoft Azure
Google
+10 more
View
MiniMax logo
MiniMax-M2.1
MiniMax
40
230B
(10B active at inference time)
205k
$0.5
56
🤗
MiniMax
DeepInfra
GMI
+4 more
View
Xiaomi logo
MiMo-V2-Flash (Reasoning)
Xiaomi
39
309B
(15B active at inference time)
256k
$0.1
178
🤗
Xiaomi
View
Kimi logo
Kimi K2.5 (Non-reasoning)
Kimi
37
1.0KB
(32B active at inference time)
256k
$1.2
111
🤗
Fireworks
Together.ai
Kimi
+2 more
View
Z AI logo
GLM-4.7 (Non-reasoning)
Z AI
34
357B
(32B active at inference time)
200k
$0.9
159
🤗
GMI
Baseten
DeepInfra
+7 more
View
DeepSeek logo
DeepSeek V3.2 Speciale
DeepSeek
34
685B
(37B active at inference time)
128k
$0.4
-
🤗
Parasail
View
LG AI Research logo
K-EXAONE (Reasoning)
LG AI Research
32
236B
(23B active at inference time)
256k
-
126
🤗
FriendliAI
View
DeepSeek logo
DeepSeek V3.2 (Non-reasoning)
DeepSeek
32
685B
(37B active at inference time)
128k
$0.3
31
🤗
DeepInfra
GMI
Fireworks
+7 more
View
Kimi logo
Kimi K2 0905
Kimi
31
1.0KB
(32B active at inference time)
256k
$1.2
55
🤗
Groq
Parasail
Together.ai
+4 more
View
Xiaomi logo
MiMo-V2-Flash (Non-reasoning)
Xiaomi
31
309B
(15B active at inference time)
256k
$0.1
162
🤗
Xiaomi
View
Alibaba logo
Qwen3 235B A22B 2507 (Reasoning)
Alibaba
29
235B
(22B active at inference time)
256k
$2.6
51
🤗
Eigen AI
Together.ai
Novita
+5 more
View
Alibaba logo
Qwen3 VL 235B A22B (Reasoning)
Alibaba
28
235B
(22B active at inference time)
262k
$2.6
55
🤗
Alibaba Cloud
Fireworks
Novita
View
DeepSeek logo
DeepSeek R1 0528 (May '25)
DeepSeek
27
685B
(37B active at inference time)
128k
$2.4
-
🤗
Hyperbolic
Nebius
SambaNova
+7 more
View
Alibaba logo
Qwen3 235B A22B 2507 Instruct
Alibaba
25
235B
(22B active at inference time)
256k
$1.2
64
🤗
Alibaba Cloud
Together.ai
DeepInfra
+9 more
View
Alibaba logo
Qwen3 Coder 480B A35B Instruct
Alibaba
25
480B
(35B active at inference time)
262k
$3.0
62
🤗
DeepInfra
Hyperbolic
Together.ai
+7 more
View
LG AI Research logo
K-EXAONE (Non-reasoning)
LG AI Research
23
236B
(23B active at inference time)
256k
-
96
🤗
FriendliAI
View
Mistral logo
Mistral Large 3
Mistral
23
675B
(41B active at inference time)
256k
$0.8
66
🤗
Amazon Bedrock
Mistral
Microsoft Azure
View
InclusionAI logo
Ring-1T
InclusionAI
23
1.0KB
(50B active at inference time)
128k
$1.0
52
🤗
ZenMux
View
Nous Research logo
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research
22
406B
128k
$1.5
37
🤗
Nebius
View
Alibaba logo
Qwen3 VL 235B A22B Instruct
Alibaba
21
235B
(22B active at inference time)
262k
$1.2
52
🤗
GMI
Novita
Fireworks
+4 more
View
NVIDIA logo
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA
20
253B
128k
$0.9
37
🤗
Nebius
View
InclusionAI logo
Ling-1T
InclusionAI
19
1.0KB
(50B active at inference time)
128k
-
-
🤗
-
View
Meta logo
Llama 4 Maverick
Meta
18
402B
(17B active at inference time)
1.00M
$0.5
128
🤗
DeepInfra
Google
SambaNova
+9 more
View
Baidu logo
ERNIE 4.5 300B A47B
Baidu
17
300B
(47B active at inference time)
131k
$0.5
32
🤗
SiliconFlow
Novita
View
Nous Research logo
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research
17
406B
128k
$1.5
33
🤗
Nebius
View
Meta logo
Llama 3.1 Instruct 405B
Meta
14
405B
128k
$4.2
27
🤗
Databricks
Hyperbolic
Google
+5 more
View
Perplexity logo
R1 1776
Perplexity
12
671B
(37B active at inference time)
128k
-
-
🤗
-
View
AI21 Labs logo
Jamba 1.7 Large
AI21 Labs
9
398B
(94B active at inference time)
256k
$3.5
40
🤗
AI21 Labs
View
Deep Cogito logo
Cogito v2.1 (Reasoning)
Deep Cogito
-
671B
(37B active at inference time)
128k
$1.3
77
🤗
Together.ai
View