Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Large Open Source AI Models (>150B)

Open source AI models with over 150B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Z AI logoGLM-5 and Kimi logoKimi K2.5 are the highest intelligence Large open source models, defined as those with >150B parameters, followed by Alibaba logoQwen3.5 397B A17B & MiniMax logoMiniMax-M2.5.

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Estimate (independent evaluation forthcoming)
Total Parameters
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index: Includes GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt evaluations spanning reasoning, knowledge, math & coding; Evaluation results measured independently by Artificial Analysis","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
Results claimed by AI Lab (not yet independently verified)
GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
AA-Omniscience Accuracy (Knowledge)
AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)
Humanity's Last Exam (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
SciCode (Coding)
IFBench (Instruction Following)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Active Parameters
Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
DeepSeek
Kimi
LG AI Research
Meta
MiniMax
Mistral
Xiaomi
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
Most attractive quadrant
Alibaba
DeepSeek
Kimi
LG AI Research
Meta
MiniMax
Mistral
Xiaomi
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context Window: Tokens Limit; Higher is better

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context window is the maximum number of tokens a model can accept in a single request. Higher limits allow longer prompts, documents, and more complex instructions.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Further details
WeightsProvider
Benchmarks
Z AI logo
GLM-5 (Reasoning)
Z AI
50
744B
(40B active at inference time)
200k
$1.6
80
🤗
Parasail
GMI
Fireworks
+5 more
View
Kimi logo
Kimi K2.5 (Reasoning)
Kimi
47
1.0KB
(32B active at inference time)
256k
$1.2
30
🤗
Novita
Baseten
Kimi
+6 more
View
Alibaba logo
Qwen3.5 397B A17B (Reasoning)
Alibaba
45
397B
(17B active at inference time)
262k
$1.4
92
🤗
Parasail
Together.ai
Novita
+1 more
View
MiniMax logo
MiniMax-M2.5
MiniMax
42
230B
(10B active at inference time)
205k
$0.5
56
🤗
Clarifai
SambaNova
DeepInfra
+6 more
View
DeepSeek logo
DeepSeek V3.2 (Reasoning)
DeepSeek
42
685B
(37B active at inference time)
128k
$0.3
46
🤗
SambaNova
SiliconFlow
Parasail
+5 more
View
Xiaomi logo
MiMo-V2-Flash (Feb 2026)
Xiaomi
41
309B
(15B active at inference time)
256k
$0.1
168
🤗
Xiaomi
View
Z AI logo
GLM-5 (Non-reasoning)
Z AI
41
744B
(40B active at inference time)
200k
$1.6
48
🤗
Novita
Fireworks
DeepInfra
+1 more
View
Alibaba logo
Qwen3.5 397B A17B (Non-reasoning)
Alibaba
40
397B
(17B active at inference time)
262k
$1.4
94
🤗
Alibaba Cloud
Novita
View
Kimi logo
Kimi K2.5 (Non-reasoning)
Kimi
37
1.0KB
(32B active at inference time)
256k
$1.2
33
🤗
Kimi
Baseten
Fireworks
+3 more
View
DeepSeek logo
DeepSeek V3.2 Speciale
DeepSeek
34
685B
(37B active at inference time)
128k
-
-
🤗
-
View
LG AI Research logo
K-EXAONE (Reasoning)
LG AI Research
32
236B
(23B active at inference time)
256k
-
-
🤗
-
View
DeepSeek logo
DeepSeek V3.2 (Non-reasoning)
DeepSeek
32
685B
(37B active at inference time)
128k
$0.3
47
🤗
GMI
Google
SambaNova
+7 more
View
Xiaomi logo
MiMo-V2-Flash (Non-reasoning)
Xiaomi
30
309B
(15B active at inference time)
256k
$0.1
157
🤗
Xiaomi
View
Alibaba logo
Qwen3 235B A22B 2507 (Reasoning)
Alibaba
30
235B
(22B active at inference time)
256k
$2.6
44
🤗
Novita
Alibaba Cloud
Hyperbolic
+2 more
View
Alibaba logo
Qwen3 VL 235B A22B (Reasoning)
Alibaba
28
235B
(22B active at inference time)
262k
$2.6
36
🤗
Novita
Alibaba Cloud
View
DeepSeek logo
DeepSeek R1 0528 (May '25)
DeepSeek
27
685B
(37B active at inference time)
128k
$2.4
-
🤗
Google
Together.ai
SambaNova
+6 more
View
Alibaba logo
Qwen3 235B A22B 2507 Instruct
Alibaba
25
235B
(22B active at inference time)
256k
$1.2
63
🤗
Novita
Together.ai
Alibaba Cloud
+7 more
View
Alibaba logo
Qwen3 Coder 480B A35B Instruct
Alibaba
25
480B
(35B active at inference time)
262k
$3.0
61
🤗
Hyperbolic
Alibaba Cloud
DeepInfra
+7 more
View
InclusionAI logo
Ling-1T
InclusionAI
24
1.0KB
(50B active at inference time)
128k
-
-
🤗
-
View
LG AI Research logo
K-EXAONE (Non-reasoning)
LG AI Research
23
236B
(23B active at inference time)
256k
-
-
🤗
-
View
Alibaba logo
Qwen3 VL 235B A22B Instruct
Alibaba
23
235B
(22B active at inference time)
262k
$1.2
50
🤗
DeepInfra
Novita
Alibaba Cloud
+2 more
View
Mistral logo
Mistral Large 3
Mistral
23
675B
(41B active at inference time)
256k
$0.8
55
🤗
Amazon Bedrock
Mistral
Microsoft Azure
View
InclusionAI logo
Ring-1T
InclusionAI
22
1.0KB
(50B active at inference time)
128k
-
-
🤗
-
View
Nous Research logo
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research
22
406B
128k
$1.5
35
🤗
Nebius
View
NVIDIA logo
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA
20
253B
128k
$0.9
38
🤗
Nebius
View
Meta logo
Llama 4 Maverick
Meta
18
402B
(17B active at inference time)
1.00M
$0.5
116
🤗
Google
Snowflake
DeepInfra
+9 more
View
Baidu logo
ERNIE 4.5 300B A47B
Baidu
17
300B
(47B active at inference time)
131k
$0.5
24
🤗
SiliconFlow
Novita
View
Nous Research logo
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research
17
406B
128k
$1.5
31
🤗
Nebius
View
Meta logo
Llama 3.1 Instruct 405B
Meta
15
405B
128k
$4.4
28
🤗
Microsoft Azure
Databricks
Hyperbolic
+4 more
View
AI21 Labs logo
Jamba 1.7 Large
AI21 Labs
13
398B
(94B active at inference time)
256k
$3.5
54
🤗
AI21 Labs
View
Perplexity logo
R1 1776
Perplexity
12
671B
(37B active at inference time)
128k
-
-
🤗
-
View
Deep Cogito logo
Cogito v2.1 (Reasoning)
Deep Cogito
-
671B
(37B active at inference time)
128k
$1.3
77
🤗
Together.ai
View