Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Small Open Source AI Models (4B-40B)

Open source AI models with between 4B to 40B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Alibaba logoQwen3.5 27B and Alibaba logoQwen3.5 35B A3B are the highest intelligence Small open source models, defined as those with 4B-40B parameters, followed by Z AI logoGLM-4.7-Flash & ServiceNow logoApriel-v1.6-15B-Thinker.

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Estimate (independent evaluation forthcoming)
Total Parameters
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index: Includes GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt evaluations spanning reasoning, knowledge, math & coding; Evaluation results measured independently by Artificial Analysis","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
Results claimed by AI Lab (not yet independently verified)
GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
AA-Omniscience Accuracy (Knowledge)
AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)
Humanity's Last Exam (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
SciCode (Coding)
IFBench (Instruction Following)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Active Parameters
Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
Google
Korea Telecom
LG AI Research
Mistral
NVIDIA
OpenAI
ServiceNow
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
Most attractive quadrant
Alibaba
Google
Korea Telecom
LG AI Research
Mistral
NVIDIA
OpenAI
ServiceNow
Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context Window: Tokens Limit; Higher is better

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context window is the maximum number of tokens a model can accept in a single request. Higher limits allow longer prompts, documents, and more complex instructions.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Further details
WeightsProvider
Benchmarks
Alibaba logo
Qwen3.5 27B (Reasoning)
Alibaba
42
27.8B
262k
$0.8
100
🤗
Alibaba Cloud
Novita
View
Alibaba logo
Qwen3.5 35B A3B (Reasoning)
Alibaba
37
36B
262k
$0.7
178
🤗
Novita
Alibaba Cloud
View
Z AI logo
GLM-4.7-Flash (Reasoning)
Z AI
30
31.2B
(3B active at inference time)
200k
$0.1
64
🤗
DeepInfra
Novita
View
ServiceNow logo
Apriel-v1.6-15B-Thinker
ServiceNow
28
15B
128k
-
146
🤗
Together.ai
View
Alibaba logo
Qwen3 VL 32B (Reasoning)
Alibaba
25
33.4B
256k
$2.6
89
🤗
Alibaba Cloud
View
Alibaba logo
Qwen3 30B A3B 2507 (Reasoning)
Alibaba
25
30.5B
(3.3B active at inference time)
262k
$0.8
157
🤗
Nebius
Alibaba Cloud
Clarifai
View
OpenAI logo
gpt-oss-20B (high)
OpenAI
24
21B
(3.6B active at inference time)
131k
$0.1
305
🤗
Lightning AI
Groq
Cloudflare
+7 more
View
NVIDIA logo
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA
24
31.6B
(3.6B active at inference time)
1.00M
$0.1
156
🤗
Nebius
DeepInfra
View
Naver logo
HyperCLOVA X SEED Think (32B)
Naver
24
32B
128k
-
-
🤗
-
View
Korea Telecom logo
Mi:dm K 2.5 Pro
Korea Telecom
23
32B
128k
-
-
Not available
-
View
Mistral logo
Magistral Small 1.2
Mistral
23
24B
128k
$0.8
196
🤗
Amazon Bedrock
Mistral
View
LG AI Research logo
EXAONE 4.0 32B (Reasoning)
LG AI Research
22
32B
131k
$0.7
120
🤗
FriendliAI
View
Z AI logo
GLM-4.7-Flash (Non-reasoning)
Z AI
22
31.2B
(3B active at inference time)
200k
$0.2
63
🤗
Novita
View
Alibaba logo
Qwen3 VL 32B Instruct
Alibaba
21
33.4B
256k
$1.2
68
🤗
Together.ai
Alibaba Cloud
View
NVIDIA logo
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA
21
13.2B
128k
$0.3
133
🤗
DeepInfra
View
Alibaba logo
Qwen3 Omni 30B A3B (Reasoning)
Alibaba
21
35.3B
(3B active at inference time)
65.5k
$0.4
99
🤗
Alibaba Cloud
View
OpenAI logo
gpt-oss-20B (low)
OpenAI
21
21B
(3.6B active at inference time)
131k
$0.1
296
🤗
Google
Amazon Bedrock
Together.ai
+8 more
View
Alibaba logo
Qwen3 VL 30B A3B Instruct
Alibaba
20
30B
(3B active at inference time)
256k
$0.3
112
🤗
Fireworks
Novita
DeepInfra
+1 more
View
Alibaba logo
Qwen3 VL 30B A3B (Reasoning)
Alibaba
20
30B
(3B active at inference time)
256k
$0.8
86
🤗
Novita
Fireworks
Alibaba Cloud
View
Alibaba logo
Qwen3 30B A3B 2507 Instruct
Alibaba
19
30.5B
(3.3B active at inference time)
262k
$0.3
81
🤗
Nebius
Alibaba Cloud
Clarifai
View
NVIDIA logo
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA
19
9B
131k
$0.1
122
🤗
Together.ai
Amazon Bedrock
DeepInfra
View
Alibaba logo
Qwen3 Coder 30B A3B Instruct
Alibaba
17
30.5B
(3.3B active at inference time)
262k
$0.9
22
🤗
Scaleway
Alibaba Cloud
Clarifai
+2 more
View
Allen Institute for AI logo
Olmo 3 7B Think
Allen Institute for AI
17
7B
65.5k
$0.1
78
🤗
Parasail
View
Mistral logo
Devstral Small 2
Mistral
17
24B
256k
-
213
🤗
Mistral
View
Alibaba logo
Qwen3 VL 8B (Reasoning)
Alibaba
17
8.77B
256k
$0.7
133
🤗
Alibaba Cloud
View
DeepSeek logo
DeepSeek R1 0528 Qwen3 8B
DeepSeek
16
8.19B
32.8k
-
-
🤗
-
View
Mistral logo
Ministral 3 14B
Mistral
16
14B
256k
$0.2
151
🤗
Amazon Bedrock
Mistral
Together.ai
View
LG AI Research logo
EXAONE 4.0 32B (Non-reasoning)
LG AI Research
16
32B
131k
$0.7
115
🤗
FriendliAI
View
Alibaba logo
Qwen3 Omni 30B A3B Instruct
Alibaba
16
35.3B
(3B active at inference time)
65.5k
$0.4
101
🤗
Alibaba Cloud
View
Mistral logo
Ministral 3 8B
Mistral
15
8B
256k
$0.1
200
🤗
Amazon Bedrock
Mistral
View
InclusionAI logo
Ling-mini-2.0
InclusionAI
15
16.3B
(1.4B active at inference time)
131k
$0.1
211
🤗
SiliconFlow
View
Mistral logo
Mistral Small 3.2
Mistral
15
24B
128k
$0.1
147
🤗
DeepInfra
Mistral
View
Alibaba logo
Qwen3 VL 8B Instruct
Alibaba
15
8.77B
256k
$0.3
136
🤗
Together.ai
Alibaba Cloud
View
NVIDIA logo
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA
15
9B
131k
$0.1
107
🤗
DeepInfra
View
NVIDIA logo
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA
14
4.51B
128k
-
-
🤗
-
View
Reka AI logo
Reka Flash 3
Reka AI
14
21B
128k
$0.3
52
🤗
Reka AI
View
NVIDIA logo
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA
14
31.6B
(3.6B active at inference time)
1.00M
$0.1
119
🤗
DeepInfra
View
NVIDIA logo
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA
14
13.2B
128k
$0.3
137
🤗
DeepInfra
Nebius
Amazon Bedrock
View
IBM logo
Granite 4.0 H Small
IBM
13
32B
(9B active at inference time)
128k
$0.1
501
🤗
Replicate
View
Microsoft Azure logo
Phi-4
Microsoft Azure
13
14B
16.0k
$0.2
9
🤗
DeepInfra
Microsoft Azure
View
Allen Institute for AI logo
Olmo 3 7B Instruct
Allen Institute for AI
13
7B
65.5k
$0.1
44
🤗
Parasail
View
Google logo
Gemma 3 12B Instruct
Google
12
12.2B
128k
-
37
🤗
DeepInfra
Amazon Bedrock
Cloudflare
+2 more
View
Liquid AI logo
LFM2 8B A1B
Liquid AI
11
8.34B
(1.5B active at inference time)
32.8k
-
-
🤗
?
View
Nous Research logo
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research
11
24B
32.0k
-
-
🤗
-
View
Meta logo
Llama 3.2 Instruct 11B (Vision)
Meta
11
11B
128k
$0.2
54
🤗
Microsoft Azure
DeepInfra
Amazon Bedrock
View
Google logo
Gemma 3n E4B Instruct
Google
11
8.39B
(4B active at inference time)
32.0k
$0.0
46
🤗
Together.ai
View
Liquid AI logo
LFM2 24B A2B
Liquid AI
10
23.8B
(2.3B active at inference time)
32.8k
$0.1
103
🤗
Together.ai
View
Google logo
Gemma 3 27B Instruct
Google
10
27.4B
128k
-
37
🤗
Amazon Bedrock
DeepInfra
Parasail
+2 more
View
Microsoft Azure logo
Phi-4 Multimodal Instruct
Microsoft Azure
10
5.6B
128k
-
17
🤗
Microsoft Azure
View
Google logo
Gemma 3n E2B Instruct
Google
10
5.98B
(2B active at inference time)
32.0k
-
51
🤗
Google
View
Allen Institute for AI logo
Molmo 7B-D
Allen Institute for AI
9
8.02B
4.10k
-
-
🤗
-
View
Nous Research logo
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research
8
8B
128k
-
-
🤗
-
View
TII UAE logo
Falcon-H1R-7B
TII UAE
-
7B
256k
-
-
Not available
-
View
StepFun logo
Step3 VL 10B
StepFun
-
10.2B
65.5k
-
-
🤗
-
View
Allen Institute for AI logo
Molmo2-8B
Allen Institute for AI
-
8.66B
36.9k
-
132
🤗
Parasail
View
Allen Institute for AI logo
Olmo 3.1 32B Instruct
Allen Institute for AI
-
32.2B
65.5k
$0.3
48
🤗
DeepInfra
View
Allen Institute for AI logo
Olmo 3.1 32B Think
Allen Institute for AI
-
32.2B
65.5k
-
84
🤗
Parasail
View
Trillion Labs logo
Tri-21B-Think
Trillion Labs
-
21B
32.0k
-
-
Not available
-
View
Trillion Labs logo
Tri-21B-think Preview
Trillion Labs
-
21B
32.0k
-
-
Not available
-
View