Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Tiny Open Source AI Models (≤4B)

Name: Artificial Analysis Intelligence Index
Creator: Artificial Analysis
License: https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf

Open source AI models with 4B parameters or fewer. These are usually the smallest models in terms of resource demand. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Alibaba logo Qwen3 4B 2507 and Qwen3 4B 2507 are the highest intelligence Tiny open source models, defined as those with ≤4B parameters, followed by LG AI Research logo Exaone 4.0 1.2B & Qwen3 VL 4B.

Intelligence

Artificial Analysis Intelligence Index; Higher is better

Estimate (independent evaluation forthcoming)

Total Parameters

Trainable parameters in billions

Navigation

Openness Intelligence Size Context Window

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better

Results claimed by AI Lab (not yet independently verified)

GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)

Terminal-Bench Hard (Agentic Coding & Terminal Use)

𝜏²-Bench Telecom (Agentic Tool Use)

AA-LCR (Long Context Reasoning)

AA-Omniscience Accuracy (Knowledge)

AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)

Humanity's Last Exam (Reasoning & Knowledge)

GPQA Diamond (Scientific Reasoning)

SciCode (Coding)

IFBench (Instruction Following)

CritPt (Physics Reasoning)

MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

Active Parameters

Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index

Most attractive quadrant

AI21 Labs

Alibaba

Google

IBM

LG AI Research

Microsoft Azure

Mistral

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)

Most attractive quadrant

AI21 Labs

Alibaba

Google

IBM

LG AI Research

Microsoft Azure

Mistral

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window: Tokens Limit; Higher is better

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

						Weights		Provider Benchmarks
Qwen3 4B 2507 (Reasoning) Alibaba	18	4.02B	262k	-	-	🤗	-	View
Qwen3 4B 2507 Instruct Alibaba	16	4.02B	262k	-	-	🤗	-	View
Exaone 4.0 1.2B (Reasoning) LG AI Research	15	1.28B	64.0k	-	-	🤗	-	View
Qwen3 VL 4B Instruct Alibaba	14	4.44B	256k	-	-	🤗	-	View
Qwen3 VL 4B (Reasoning) Alibaba	14	4.44B	256k	-	-	🤗	-	View
Qwen3 1.7B (Reasoning) Alibaba	13	2.03B	32.0k	$0.4	130	🤗		View
Ministral 3 3B Mistral	13	3B	256k	$0.1	312	🤗		View
Jamba Reasoning 3B AI21 Labs	13	3B	262k	-	-	🤗	-	View
Exaone 4.0 1.2B (Non-reasoning) LG AI Research	12	1.28B	64.0k	-	-	🤗	-	View
Granite 4.0 Micro IBM	11	3B	128k	-	-	🤗	-	View
Phi-4 Mini Instruct Microsoft Azure	11	3.84B	128k	-	44	🤗		View
Gemma 3 4B Instruct Google	11	4.3B	128k	-	38	🤗		View
Qwen3 1.7B (Non-reasoning) Alibaba	11	2.03B	32.0k	$0.2	125	🤗		View
Qwen3 0.6B (Reasoning) Alibaba	11	0.752B	32.0k	$0.4	206	🤗		View
Granite 4.0 H 1B IBM	10	1.5B	128k	-	-	🤗	-	View
Granite 4.0 1B IBM	10	1.6B	128k	-	-	🤗	-	View
LFM2 2.6B Liquid AI	10	2.57B	32.8k	-	-	🤗	?	View
Qwen3 0.6B (Non-reasoning) Alibaba	10	0.752B	32.0k	$0.2	197	🤗		View
Granite 4.0 H 350M IBM	9	0.34B	32.8k	-	-	🤗	-	View
Granite 4.0 350M IBM	9	0.35B	32.8k	-	-	🤗	-	View
Gemma 3 1B Instruct Google	9	1B	32.0k	-	55	🤗		View
Gemma 3 270M Google	8	0.268B	32.0k	-	-	🤗	-	View
LFM2.5-VL-1.6B Liquid AI	-	1.6B	32.0k	-	-	🤗	-	View
LFM2.5-1.2B-Thinking Liquid AI	-	1.17B	32.0k	-	-	🤗	-	View
LFM2.5-1.2B-Instruct Liquid AI	-	1.17B	32.0k	-	-	🤗	?	View
Tiny Aya Global Cohere	-	3.35B	8.19k	-	-	🤗	-	View

Comparisons of Tiny Open Source AI Models (≤4B)

Navigation

Openness

Artificial Analysis Openness Index: Results

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index

Intelligence Evaluations

Intelligence Evaluation Relevance

Artificial Analysis Intelligence Index

Size

Model Size: Total and Active Parameters

Total Parameters

Active Parameters at Inference Time

Intelligence vs. Active Parameters

Artificial Analysis Intelligence Index

Active Parameters at Inference Time

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Context Window

Context Window

Context Window for RAG

Context window