Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Large Open Source AI Models (>150B)

Name: Artificial Analysis Intelligence Index
Creator: Artificial Analysis
License: https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf

Open source AI models with over 150B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Z AI logo GLM-5 and Kimi logo Kimi K2.5 are the highest intelligence Large open source models, defined as those with >150B parameters, followed by Alibaba logo Qwen3.5 397B A17B & DeepSeek logo DeepSeek V3.2.

Intelligence

Artificial Analysis Intelligence Index; Higher is better

Total Parameters

Trainable parameters in billions

Navigation

Openness Intelligence Size Context Window

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

+ Add model from specific provider

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

+ Add model from specific provider

Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better

+ Add model from specific provider

Results claimed by AI Lab (not yet independently verified)

GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)

Terminal-Bench Hard (Agentic Coding & Terminal Use)

𝜏²-Bench Telecom (Agentic Tool Use)

AA-LCR (Long Context Reasoning)

AA-Omniscience Accuracy (Knowledge)

AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)

Humanity's Last Exam (Reasoning & Knowledge)

GPQA Diamond (Scientific Reasoning)

SciCode (Coding)

IFBench (Instruction Following)

CritPt (Physics Reasoning)

MMMU Pro (Visual Reasoning)

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

+ Add model from specific provider

Active Parameters

Passive Parameters

Reasoning models are indicated by a lightbulb icon.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index

+ Add model from specific provider

Most attractive quadrant

Alibaba

DeepSeek

Kimi

LG AI Research

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)

+ Add model from specific provider

Most attractive quadrant

Alibaba

DeepSeek

Kimi

LG AI Research

Context Window

Context Window: Tokens Limit; Higher is better

+ Add model from specific provider

Reasoning models are indicated by a lightbulb icon.

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

						Weights		Provider Benchmarks
GLM-5 (Reasoning) Z AI	50	744B (40B active at inference time)	200k	$1.6	69	🤗	+10 more	View
Kimi K2.5 (Reasoning) Kimi	47	1.0KB (32B active at inference time)	256k	$1.2	47	🤗	+14 more	View
Qwen3.5 397B A17B (Reasoning) Alibaba	45	397B (17B active at inference time)	262k	$1.4	84	🤗	+5 more	View
DeepSeek V3.2 (Reasoning) DeepSeek	42	685B (37B active at inference time)	128k	$0.3	33	🤗	+8 more	View
MiMo-V2-Flash (Feb 2026) Xiaomi	41	309B (15B active at inference time)	256k	$0.1	129	🤗		View
GLM-5 (Non-reasoning) Z AI	41	744B (40B active at inference time)	200k	$1.6	68	🤗	+4 more	View
Qwen3.5 397B A17B (Non-reasoning) Alibaba	40	397B (17B active at inference time)	262k	$1.4	84	🤗	+2 more	View
Step 3.5 Flash StepFun	38	196B (11B active at inference time)	256k	$0.1	117	🤗		View
Kimi K2.5 (Non-reasoning) Kimi	37	1.0KB (32B active at inference time)	256k	$1.2	45	🤗	+6 more	View
K-EXAONE (Reasoning) LG AI Research	32	236B (23B active at inference time)	256k	-	-	🤗	-	View
DeepSeek V3.2 (Non-reasoning) DeepSeek	32	685B (37B active at inference time)	128k	$0.3	33	🤗	+11 more	View
MiMo-V2-Flash (Non-reasoning) Xiaomi	30	309B (15B active at inference time)	256k	$0.1	138	🤗		View
Qwen3 235B A22B 2507 (Reasoning) Alibaba	30	235B (22B active at inference time)	256k	$2.6	43	🤗	+4 more	View
DeepSeek V3.2 Speciale DeepSeek	29	685B (37B active at inference time)	128k	-	-	🤗	-	View
Qwen3 VL 235B A22B (Reasoning) Alibaba	28	235B (22B active at inference time)	262k	$2.6	52	🤗		View
DeepSeek R1 0528 (May '25) DeepSeek	27	685B (37B active at inference time)	128k	$2.4	-	🤗	+6 more	View
Qwen3 235B A22B 2507 Instruct Alibaba	25	235B (22B active at inference time)	256k	$1.2	62	🤗	+10 more	View
Qwen3 Coder 480B A35B Instruct Alibaba	25	480B (35B active at inference time)	262k	$3.0	62	🤗	+8 more	View
K-EXAONE (Non-reasoning) LG AI Research	23	236B (23B active at inference time)	256k	-	-	🤗	-	View
Mistral Large 3 Mistral	23	675B (41B active at inference time)	256k	$0.8	49	🤗		View
Ring-1T InclusionAI	23	1.0KB (50B active at inference time)	128k	-	-	🤗	-	View
Qwen3 VL 235B A22B Instruct Alibaba	21	235B (22B active at inference time)	262k	$1.2	59	🤗	+2 more	View
Ling-1T InclusionAI	19	1.0KB (50B active at inference time)	128k	-	-	🤗	-	View
Hermes 4 - Llama-3.1 405B (Reasoning) Nous Research	19	406B	128k	$1.5	29	🤗		View
Llama 4 Maverick Meta	18	402B (17B active at inference time)	1.00M	$0.5	123	🤗	+10 more	View
Hermes 4 - Llama-3.1 405B (Non-reasoning) Nous Research	18	406B	128k	$1.5	31	🤗		View
Llama 3.1 Instruct 405B Meta	17	405B	128k	$4.4	34	🤗	+2 more	View
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) NVIDIA	15	253B	128k	$0.9	42	🤗		View
ERNIE 4.5 300B A47B Baidu	15	300B (47B active at inference time)	131k	$0.5	34	🤗		View
R1 1776 Perplexity	12	671B (37B active at inference time)	128k	-	-	🤗	-	View
Jamba 1.7 Large AI21 Labs	11	398B (94B active at inference time)	256k	$3.5	59	🤗		View
Cogito v2.1 (Reasoning) Deep Cogito	-	671B (37B active at inference time)	128k	$1.3	86	🤗		View

Comparisons of Large Open Source AI Models (>150B)

Navigation

Openness

Artificial Analysis Openness Index: Results

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index

Intelligence Evaluations

Intelligence Evaluation Relevance

Artificial Analysis Intelligence Index

Size

Model Size: Total and Active Parameters

Total Parameters

Active Parameters at Inference Time

Intelligence vs. Active Parameters

Artificial Analysis Intelligence Index

Active Parameters at Inference Time

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Context Window

Context Window

Context Window for RAG

Context window