Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Tiny Open Source AI Models (≤4B)

Open source AI models with 4B parameters or fewer. These are usually the smallest models in terms of resource demand. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Alibaba logoQwen3 4B 2507 and Alibaba logoQwen3 4B 2507 are the highest intelligence Tiny open source models, defined as those with ≤4B parameters, followed by LG AI Research logoExaone 4.0 1.2B & Alibaba logoQwen3 VL 4B.

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Estimate (independent evaluation forthcoming)
Total Parameters
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index: Includes GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt evaluations spanning reasoning, knowledge, math & coding; Evaluation results measured independently by Artificial Analysis","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
Results claimed by AI Lab (not yet independently verified)
GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
AA-Omniscience Accuracy (Knowledge)
AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)
Humanity's Last Exam (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
SciCode (Coding)
IFBench (Instruction Following)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Active Parameters
Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
Most attractive quadrant
AI21 Labs
Alibaba
Google
IBM
LG AI Research
Microsoft Azure
Mistral

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
Most attractive quadrant
AI21 Labs
Alibaba
Google
IBM
LG AI Research
Microsoft Azure
Mistral

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context Window: Tokens Limit; Higher is better

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context window is the maximum number of tokens a model can accept in a single request. Higher limits allow longer prompts, documents, and more complex instructions.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}

Further details
WeightsProvider
Benchmarks
Alibaba logo
Qwen3 4B 2507 (Reasoning)
Alibaba
18
4.02B
262k
-
-
🤗
-
View
Alibaba logo
Qwen3 4B 2507 Instruct
Alibaba
16
4.02B
262k
-
-
🤗
-
View
LG AI Research logo
Exaone 4.0 1.2B (Reasoning)
LG AI Research
15
1.28B
64.0k
-
-
🤗
-
View
Alibaba logo
Qwen3 VL 4B Instruct
Alibaba
14
4.44B
256k
-
-
🤗
-
View
Alibaba logo
Qwen3 VL 4B (Reasoning)
Alibaba
14
4.44B
256k
-
-
🤗
-
View
Alibaba logo
Qwen3 1.7B (Reasoning)
Alibaba
13
2.03B
32.0k
$0.4
130
🤗
Alibaba Cloud
View
Mistral logo
Ministral 3 3B
Mistral
13
3B
256k
$0.1
312
🤗
Mistral
Amazon Bedrock
View
AI21 Labs logo
Jamba Reasoning 3B
AI21 Labs
13
3B
262k
-
-
🤗
-
View
LG AI Research logo
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research
12
1.28B
64.0k
-
-
🤗
-
View
IBM logo
Granite 4.0 Micro
IBM
11
3B
128k
-
-
🤗
-
View
Microsoft Azure logo
Phi-4 Mini Instruct
Microsoft Azure
11
3.84B
128k
-
44
🤗
Microsoft Azure
View
Google logo
Gemma 3 4B Instruct
Google
11
4.3B
128k
-
38
🤗
Amazon Bedrock
Google
DeepInfra
View
Alibaba logo
Qwen3 1.7B (Non-reasoning)
Alibaba
11
2.03B
32.0k
$0.2
125
🤗
Alibaba Cloud
View
Alibaba logo
Qwen3 0.6B (Reasoning)
Alibaba
11
0.752B
32.0k
$0.4
206
🤗
Alibaba Cloud
View
IBM logo
Granite 4.0 H 1B
IBM
10
1.5B
128k
-
-
🤗
-
View
IBM logo
Granite 4.0 1B
IBM
10
1.6B
128k
-
-
🤗
-
View
Liquid AI logo
LFM2 2.6B
Liquid AI
10
2.57B
32.8k
-
-
🤗
?
View
Alibaba logo
Qwen3 0.6B (Non-reasoning)
Alibaba
10
0.752B
32.0k
$0.2
197
🤗
Alibaba Cloud
View
IBM logo
Granite 4.0 H 350M
IBM
9
0.34B
32.8k
-
-
🤗
-
View
IBM logo
Granite 4.0 350M
IBM
9
0.35B
32.8k
-
-
🤗
-
View
Google logo
Gemma 3 1B Instruct
Google
9
1B
32.0k
-
55
🤗
Google
View
Google logo
Gemma 3 270M
Google
8
0.268B
32.0k
-
-
🤗
-
View
Liquid AI logo
LFM2.5-VL-1.6B
Liquid AI
-
1.6B
32.0k
-
-
🤗
-
View
Liquid AI logo
LFM2.5-1.2B-Thinking
Liquid AI
-
1.17B
32.0k
-
-
🤗
-
View
Liquid AI logo
LFM2.5-1.2B-Instruct
Liquid AI
-
1.17B
32.0k
-
-
🤗
?
View
Cohere logo
Tiny Aya Global
Cohere
-
3.35B
8.19k
-
-
🤗
-
View