Comparisons of Small Open Source AI Models (4B-40B)

Open source AI models with between 4B to 40B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Alibaba logoQwen3.6 27B and Alibaba logoQwen3.6 35B A3B are the highest intelligence Small open source models, defined as those with 4B-40B parameters, followed by Google logoGemma 4 31B & Alibaba logoQwen3.6 27B.

Highlights

Artificial Analysis Openness Index · Higher is better
Artificial Analysis Intelligence Index · Higher is better
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA

Agentic real-world work tasks, (Elo-500)/2000

Terminal-Bench Hard

Agentic coding & terminal use

𝜏²-Bench Telecom

Agentic tool use

AA-LCR

Long context reasoning

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

SciCode

Coding

IFBench

Instruction following

CritPt

Physics reasoning

APEX-Agents-AA

Long-horizon agentic tasks

ITBench-AA

Kubernetes incident root-cause analysis

MMMU-Pro

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
Google
LG AI Research
NVIDIA
OpenAI
ServiceNow
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
Google
LG AI Research
NVIDIA
OpenAI
ServiceNow
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

Weights
Provider Benchmarks
Qwen3.6 27B (Reasoning)
Alibaba logoAlibaba
46
27.8B
262k
$0.9
61
DeepInfraMakoraNovita
+2
Qwen3.6 35B A3B (Reasoning)
Alibaba logoAlibaba
43
36B
3B active at inference time
262k
$0.4
187
NovitaParasailDeepInfra
+6
Gemma 4 31B (Reasoning)
Google logoGoogle
39
30.7B
256k
-
35
ParasailGoogleDeepInfra
+8
Qwen3.6 27B (Non-reasoning)
Alibaba logoAlibaba
37
27.8B
262k
$0.9
63
MakoraNovitaDeepInfraAlibaba Cloud
Qwen3.5 9B (Reasoning)
Alibaba logoAlibaba
32
9.65B
262k
$0.1
82
Together.aiSiliconFlow
Gemma 4 31B (Non-reasoning)
Google logoGoogle
32
30.7B
256k
$0.2
47
ParasailSiliconFlowFriendliAI
+4
Qwen3.6 35B A3B (Non-reasoning)
Alibaba logoAlibaba
32
36B
3B active at inference time
262k
$0.6
195
ClarifaiGMIScaleway
+5
Gemma 4 26B A4B (Reasoning)
Google logoGoogle
31
25.2B
3.8B active at inference time
256k
$0.1
-
ParasailDeepInfraCloudflare
+4
Qwen3.5 35B A3B (Non-reasoning)
Alibaba logoAlibaba
31
36B
3B active at inference time
262k
$0.4
186
Alibaba CloudDeepInfra
EXAONE 4.5 33B
LG AI Research logoLG AI Research
30
34.4B
262k
-
-
-
Gemma 4 12B (Reasoning)
Google logoGoogle
29
12B
256k
-
-
-
Nemotron Cascade 2 30B A3B
NVIDIA logoNVIDIA
28
31.6B
3B active at inference time
1.00M
-
-
-
Apriel-v1.6-15B-Thinker
ServiceNow logoServiceNow
28
15B
128k
-
-
Together.ai
Qwen3.5 9B (Non-reasoning)
Alibaba logoAlibaba
27
9.65B
262k
-
-
-
Gemma 4 26B A4B (Non-reasoning)
Google logoGoogle
27
25.2B
3.8B active at inference time
256k
$0.2
76
NovitaGMIClarifai
+4
Qwen3.5 4B (Reasoning)
Alibaba logoAlibaba
27
4.66B
262k
$0.0
203
DeepInfra
gpt-oss-20B (high)
OpenAI logoOpenAI
24
21B
3.6B active at inference time
131k
$0.1
251
DeepInfraDatabricksCloudflare
+10
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA logoNVIDIA
24
31.6B
3.6B active at inference time
1.00M
$0.1
167
DeepInfraNebius
HyperCLOVA X SEED Think (32B)
Naver logoNaver
24
32B
128k
-
-
-
Qwen3.5 4B (Non-reasoning)
Alibaba logoAlibaba
23
4.66B
262k
$0.0
208
DeepInfra
Nemotron 3 Nano Omni 30B A3B Reasoning
NVIDIA logoNVIDIA
21
30B
3B active at inference time
256k
$0.1
298
NebiusClarifai
gpt-oss-20B (low)
OpenAI logoOpenAI
21
21B
3.6B active at inference time
131k
$0.1
246
CoreWeaveHyperbolicGroq
+9
Tri-21B-think Preview
Trillion Labs logoTrillion Labs
20
21B
32.0k
-
-
-
Devstral Small 2
Mistral logoMistral
19
24B
256k
-
62
Mistral
Gemma 4 E4B (Reasoning)
Google logoGoogle
19
8B
4.5B active at inference time
128k
-
-
-
Tri-21B-Think
Trillion Labs logoTrillion Labs
19
21B
32.0k
-
-
-
Magistral Small 1.2
Mistral logoMistral
18
24B
128k
$0.6
109
Amazon BedrockMistral
EXAONE 4.0 32B (Reasoning)
LG AI Research logoLG AI Research
17
32B
131k
-
-
-
Ministral 3 14B
Mistral logoMistral
16
14B
256k
$0.2
84
Amazon BedrockMistral
Falcon-H1R-7B
TII UAE logoTII UAE
16
7B
256k
-
-
-
Qwen3 Omni 30B A3B (Reasoning)
Alibaba logoAlibaba
16
35.3B
3B active at inference time
65.5k
$0.3
76
Alibaba Cloud
Step3 VL 10B
StepFun logoStepFun
15
10.2B
65.5k
-
-
-
Gemma 4 E2B (Reasoning)
Google logoGoogle
15
5.1B
2.3B active at inference time
128k
-
-
-
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA logoNVIDIA
15
13.2B
128k
$0.2
295
DeepInfra
Ministral 3 8B
Mistral logoMistral
15
8B
256k
$0.1
106
Amazon BedrockMistral
Gemma 4 E4B (Non-reasoning)
Google logoGoogle
15
8B
4.5B active at inference time
128k
-
-
-
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA logoNVIDIA
15
9B
131k
$0.1
121
DeepInfra
Granite 4.1 30B
IBM logoIBM
15
30B
131k
-
-
-
Olmo 3.1 32B Think
Allen Institute for AI logoAllen Institute for AI
14
32.2B
65.5k
-
-
Parasail
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA logoNVIDIA
13
31.6B
3.6B active at inference time
1.00M
$0.1
96
DeepInfra
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA logoNVIDIA
13
9B
131k
$0.1
137
DeepInfraAmazon Bedrock
Granite 4.1 8B
IBM logoIBM
12
8B
131k
$0.1
122
CoreWeave
Sarvam 30B (high)
Sarvam logoSarvam
12
32.2B
2.4B active at inference time
65.5k
$0.0
166
Sarvam
Olmo 3.1 32B Instruct
Allen Institute for AI logoAllen Institute for AI
12
32.2B
65.5k
-
-
-
Gemma 4 E2B (Non-reasoning)
Google logoGoogle
12
5.1B
2.3B active at inference time
128k
-
-
-
EXAONE 4.0 32B (Non-reasoning)
LG AI Research logoLG AI Research
12
32B
131k
-
-
-
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research logoNous Research
11
24B
32.0k
-
-
-
Granite 4.0 H Small
IBM logoIBM
11
32B
9B active at inference time
128k
$0.1
388
Replicate
Qwen3 Omni 30B A3B Instruct
Alibaba logoAlibaba
11
35.3B
3B active at inference time
65.5k
$0.3
93
Alibaba Cloud
LFM2 24B A2B
Liquid AI logoLiquid AI
10
23.8B
2.3B active at inference time
32.8k
$0.0
121
Together.ai
Phi-4
Microsoft logoMicrosoft
10
14B
16.0k
$0.2
33
DeepInfraMicrosoft Azure
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA logoNVIDIA
10
13.2B
128k
$0.2
235
DeepInfraAmazon Bedrock
Phi-4 Multimodal Instruct
Microsoft logoMicrosoft
10
5.6B
128k
-
17
Microsoft Azure
Reka Flash 3
Reka AI logoReka AI
10
21B
128k
$0.3
-
Reka AI
Olmo 3 7B Think
Allen Institute for AI logoAllen Institute for AI
9
7B
65.5k
-
-
-
Molmo 7B-D
Allen Institute for AI logoAllen Institute for AI
9
8.02B
4.10k
-
-
-
Ling-mini-2.0
InclusionAI logoInclusionAI
9
16.3B
1.4B active at inference time
131k
-
-
-
Llama 3.2 Instruct 11B (Vision)
Meta logoMeta
9
11B
128k
$0.2
51
Microsoft AzureAmazon BedrockDeepInfra
Olmo 3 7B Instruct
Allen Institute for AI logoAllen Institute for AI
8
7B
65.5k
$0.1
-
Parasail
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research logoNous Research
8
8B
128k
-
-
-
Molmo2-8B
Allen Institute for AI logoAllen Institute for AI
7
8.66B
36.9k
-
-
-
LFM2 8B A1B
Liquid AI logoLiquid AI
7
8.34B
1.5B active at inference time
32.8k
-
-
?
Apertus 8B Instruct
Swiss AI Initiative logoSwiss AI Initiative
6
8B
65.5k
$0.1
-
Public AI
EXAONE 4.5 33B (Non-reasoning)
LG AI Research logoLG AI Research
-
34.4B
262k
-
-
-