Comparisons of Tiny Open Source AI Models (≤4B)

Open source AI models with 4B parameters or fewer. These are usually the smallest models in terms of resource demand. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
OpenBMB logoMiniCPM5-1B and OpenBMB logoMiniCPM5-1B are the highest intelligence Tiny open source models, defined as those with ≤4B parameters, followed by Alibaba logoQwen3.5 2B & Nanbeige logoNanbeige4.1-3B.

Highlights

Artificial Analysis Openness Index · Higher is better
Artificial Analysis Intelligence Index · Higher is better
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Score

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA

Agentic real-world work tasks, (Elo-500)/2000

Terminal-Bench Hard

Agentic coding & terminal use

𝜏²-Bench Telecom

Agentic tool use

AA-LCR

Long context reasoning

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

SciCode

Coding

IFBench

Instruction following

CritPt

Physics reasoning

APEX-Agents-AA

Long-horizon agentic tasks

No data available
ITBench-AA

Kubernetes incident root-cause analysis

No data available
MMMU-Pro

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
AI21 Labs
Alibaba
IBM
LG AI Research
Liquid AI
Microsoft
Mistral
Nanbeige
NVIDIA
OpenBMB
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
AI21 Labs
Alibaba
IBM
LG AI Research
Liquid AI
Microsoft
Mistral
Nanbeige
NVIDIA
OpenBMB
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

Weights
Provider Benchmarks
MiniCPM5-1B (Reasoning)
OpenBMB logoOpenBMB
18
1B
128k
-
-
-
MiniCPM5-1B (Non-reasoning)
OpenBMB logoOpenBMB
18
1B
128k
-
-
-
Qwen3.5 2B (Reasoning)
Alibaba logoAlibaba
16
2.27B
262k
$0.0
-
DeepInfra
Nanbeige4.1-3B
Nanbeige logoNanbeige
16
3.93B
256k
-
-
-
NVIDIA Nemotron 3 Nano 4B
NVIDIA logoNVIDIA
15
3.97B
262k
-
-
-
Qwen3.5 2B (Non-reasoning)
Alibaba logoAlibaba
15
2.27B
262k
$0.0
340
DeepInfra
MiniCPM-V 4.6 1.3B
OpenBMB logoOpenBMB
13
1.3B
262k
-
-
-
Ministral 3 3B
Mistral logoMistral
11
3B
256k
$0.1
149
MistralAmazon Bedrock
Qwen3.5 0.8B (Reasoning)
Alibaba logoAlibaba
11
0.873B
262k
$0.0
-
DeepInfra
Qwen3.5 0.8B (Non-reasoning)
Alibaba logoAlibaba
10
0.873B
262k
$0.0
89
DeepInfra
Jamba Reasoning 3B
AI21 Labs logoAI21 Labs
10
3B
262k
-
-
-
Granite 4.1 3B
IBM logoIBM
9
3B
131k
-
-
-
Phi-4 Mini Instruct
Microsoft logoMicrosoft
8
3.84B
128k
-
24
Microsoft AzureCoreWeave
Exaone 4.0 1.2B (Reasoning)
LG AI Research logoLG AI Research
8
1.28B
64.0k
-
-
-
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research logoLG AI Research
8
1.28B
64.0k
-
-
-
LFM2.5-1.2B-Thinking
Liquid AI logoLiquid AI
8
1.17B
32.0k
-
-
-
LFM2 2.6B
Liquid AI logoLiquid AI
8
2.57B
32.8k
-
-
?
LFM2.5-1.2B-Instruct
Liquid AI logoLiquid AI
8
1.17B
32.0k
-
-
?
Granite 4.0 H 1B
IBM logoIBM
8
1.5B
128k
-
-
-
Gemma 3 270M
Google logoGoogle
8
0.268B
32.0k
-
-
-
Granite 4.0 Micro
IBM logoIBM
8
3B
128k
-
-
-
Granite 4.0 1B
IBM logoIBM
7
1.6B
128k
-
-
-
LFM2.5-VL-1.6B
Liquid AI logoLiquid AI
6
1.6B
32.0k
-
-
?
Granite 4.0 350M
IBM logoIBM
6
0.35B
32.8k
-
-
-
Granite 4.0 H 350M
IBM logoIBM
5
0.34B
32.8k
-
-
-
Tiny Aya Global
Cohere logoCohere
5
3.35B
8.19k
-
-
Cohere