Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Small Open Source AI Models (4B-40B)

Name: Artificial Analysis Intelligence Index
Creator: Artificial Analysis
License: https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf

Open source AI models with between 4B to 40B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Alibaba logo Qwen3.5 27B and Qwen3.5 27B are the highest intelligence Small open source models, defined as those with 4B-40B parameters, followed by Qwen3.5 35B A3B & Qwen3.5 9B.

Intelligence

Artificial Analysis Intelligence Index; Higher is better

Total Parameters

Trainable parameters in billions

Navigation

Openness Intelligence Size Context Window

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

+ Add model from specific provider

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

+ Add model from specific provider

Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better

+ Add model from specific provider

Results claimed by AI Lab (not yet independently verified)

GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)

Terminal-Bench Hard (Agentic Coding & Terminal Use)

𝜏²-Bench Telecom (Agentic Tool Use)

AA-LCR (Long Context Reasoning)

AA-Omniscience Accuracy (Knowledge)

AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)

Humanity's Last Exam (Reasoning & Knowledge)

GPQA Diamond (Scientific Reasoning)

SciCode (Coding)

IFBench (Instruction Following)

CritPt (Physics Reasoning)

MMMU Pro (Visual Reasoning)

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

+ Add model from specific provider

Active Parameters

Passive Parameters

Reasoning models are indicated by a lightbulb icon.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index

+ Add model from specific provider

Most attractive quadrant

Alibaba

Google

Korea Telecom

Naver

NVIDIA

OpenAI

ServiceNow

Z AI

Reasoning models are indicated by a lightbulb icon.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)

+ Add model from specific provider

Most attractive quadrant

Alibaba

Google

Korea Telecom

Naver

NVIDIA

OpenAI

ServiceNow

Z AI

Reasoning models are indicated by a lightbulb icon.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window: Tokens Limit; Higher is better

+ Add model from specific provider

Reasoning models are indicated by a lightbulb icon.

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

						Weights		Provider Benchmarks
Qwen3.5 27B (Reasoning) Alibaba	42	27.8B	262k	$0.8	90	🤗		View
Qwen3.5 27B (Non-reasoning) Alibaba	37	27.8B	262k	$0.8	94	🤗		View
Qwen3.5 35B A3B (Reasoning) Alibaba	37	36B (3B active at inference time)	262k	$0.7	175	🤗		View
Qwen3.5 9B (Reasoning) Alibaba	32	9.65B	262k	$0.1	59	🤗		View
Qwen3.5 35B A3B (Non-reasoning) Alibaba	31	36B (3B active at inference time)	262k	$0.7	163	🤗		View
GLM-4.7-Flash (Reasoning) Z AI	30	31.2B (3B active at inference time)	200k	$0.2	63	🤗		View
Apriel-v1.6-15B-Thinker ServiceNow	28	15B	128k	-	99	🤗		View
Qwen3.5 9B (Non-reasoning) Alibaba	27	9.65B	262k	-	-	🤗	-	View
Qwen3.5 4B (Reasoning) Alibaba	27	4.66B	262k	-	-	🤗	-	View
Qwen3 VL 32B (Reasoning) Alibaba	25	33.4B	256k	$2.6	88	🤗		View
gpt-oss-20B (high) OpenAI	24	21B (3.6B active at inference time)	131k	$0.1	296	🤗	+9 more	View
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) NVIDIA	24	31.6B (3.6B active at inference time)	1.00M	$0.1	144	🤗		View
HyperCLOVA X SEED Think (32B) Naver	24	32B	128k	-	-	🤗	-	View
Mi:dm K 2.5 Pro Korea Telecom	23	32B	128k	-	-	Not available	-	View
Qwen3.5 4B (Non-reasoning) Alibaba	23	4.66B	262k	-	-	🤗	-	View
Qwen3 30B A3B 2507 (Reasoning) Alibaba	22	30.5B (3.3B active at inference time)	262k	$0.8	155	🤗		View
GLM-4.7-Flash (Non-reasoning) Z AI	22	31.2B (3B active at inference time)	200k	$0.2	54	🤗		View
gpt-oss-20B (low) OpenAI	21	21B (3.6B active at inference time)	131k	$0.1	299	🤗	+9 more	View
Tri-21B-think Preview Trillion Labs	20	21B	32.0k	-	-	Not available	-	View
Qwen3 Coder 30B A3B Instruct Alibaba	20	30.5B (3.3B active at inference time)	262k	$0.9	27	🤗	+2 more	View
Qwen3 VL 30B A3B (Reasoning) Alibaba	20	30B (3B active at inference time)	256k	$0.8	111	🤗	+1 more	View
Devstral Small 2 Mistral	19	24B	256k	-	195	🤗		View
Tri-21B-Think Trillion Labs	19	21B	32.0k	-	-	Not available	-	View
Magistral Small 1.2 Mistral	18	24B	128k	$0.8	100	🤗		View
Qwen3 VL 32B Instruct Alibaba	17	33.4B	256k	$1.2	72	🤗		View
EXAONE 4.0 32B (Reasoning) LG AI Research	17	32B	131k	-	-	🤗	-	View
Qwen3 VL 8B (Reasoning) Alibaba	17	8.77B	256k	$0.7	116	🤗		View
DeepSeek R1 0528 Qwen3 8B DeepSeek	16	8.19B	32.8k	-	-	🤗	-	View
Qwen3 VL 30B A3B Instruct Alibaba	16	30B (3B active at inference time)	256k	$0.3	105	🤗	+2 more	View
Ministral 3 14B Mistral	16	14B	256k	$0.2	112	🤗		View
Falcon-H1R-7B TII UAE	16	7B	256k	-	-	Not available	-	View
Qwen3 Omni 30B A3B (Reasoning) Alibaba	16	35.3B (3B active at inference time)	65.5k	$0.4	90	🤗		View
Step3 VL 10B StepFun	15	10.2B	65.5k	-	-	🤗	-	View
Qwen3 30B A3B 2507 Instruct Alibaba	15	30.5B (3.3B active at inference time)	262k	$0.3	70	🤗	+1 more	View
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) NVIDIA	15	13.2B	128k	$0.3	130	🤗		View
Ministral 3 8B Mistral	15	8B	256k	$0.1	180	🤗		View
NVIDIA Nemotron Nano 9B V2 (Reasoning) NVIDIA	15	9B	131k	$0.1	120	🤗		View
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) NVIDIA	14	4.51B	128k	-	-	🤗	-	View
Qwen3 VL 8B Instruct Alibaba	14	8.77B	256k	$0.3	117	🤗		View
Olmo 3.1 32B Think Allen Institute for AI	14	32.2B	65.5k	-	96	🤗		View
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) NVIDIA	13	31.6B (3.6B active at inference time)	1.00M	$0.1	78	🤗		View
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) NVIDIA	13	9B	131k	$0.1	139	🤗		View
Sarvam 30B (Reasoning) Sarvam	12	32.2B	65.5k	-	192	🤗		View
Olmo 3.1 32B Instruct Allen Institute for AI	12	32.2B	65.5k	$0.3	53	🤗		View
EXAONE 4.0 32B (Non-reasoning) LG AI Research	12	32B	131k	-	-	🤗	-	View
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) Nous Research	11	24B	32.0k	-	-	🤗	-	View
Granite 4.0 H Small IBM	11	32B (9B active at inference time)	128k	$0.1	388	🤗		View
Qwen3 Omni 30B A3B Instruct Alibaba	11	35.3B (3B active at inference time)	65.5k	$0.4	96	🤗		View
LFM2 24B A2B Liquid AI	10	23.8B (2.3B active at inference time)	32.8k	$0.1	203	🤗		View
Phi-4 Microsoft Azure	10	14B	16.0k	$0.2	35	🤗		View
Gemma 3 27B Instruct Google	10	27.4B	128k	-	25	🤗	+3 more	View
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) NVIDIA	10	13.2B	128k	$0.3	135	🤗		View
Phi-4 Multimodal Instruct Microsoft Azure	10	5.6B	128k	-	17	🤗		View
Reka Flash 3 Reka AI	10	21B	128k	$0.3	45	🤗		View
Olmo 3 7B Think Allen Institute for AI	9	7B	65.5k	-	-	🤗	-	View
Ling-mini-2.0 InclusionAI	9	16.3B (1.4B active at inference time)	131k	-	-	🤗	-	View
Gemma 3 12B Instruct Google	9	12.2B	128k	-	25	🤗	+2 more	View
Llama 3.2 Instruct 11B (Vision) Meta	9	11B	128k	$0.2	46	🤗		View
Olmo 3 7B Instruct Allen Institute for AI	8	7B	65.5k	$0.1	143	🤗		View
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) Nous Research	8	8B	128k	-	-	🤗	-	View
Molmo2-8B Allen Institute for AI	7	8.66B	36.9k	-	105	🤗		View
LFM2 8B A1B Liquid AI	7	8.34B (1.5B active at inference time)	32.8k	-	-	🤗	?	View
Gemma 3n E4B Instruct Google	6	8.39B (4B active at inference time)	32.0k	$0.0	44	🤗		View
Gemma 3n E2B Instruct Google	5	5.98B (2B active at inference time)	32.0k	-	-	🤗		View

Comparisons of Small Open Source AI Models (4B-40B)

Navigation

Openness

Artificial Analysis Openness Index: Results

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index

Intelligence Evaluations

Intelligence Evaluation Relevance

Artificial Analysis Intelligence Index

Size

Model Size: Total and Active Parameters

Total Parameters

Active Parameters at Inference Time

Intelligence vs. Active Parameters

Artificial Analysis Intelligence Index

Active Parameters at Inference Time

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Context Window

Context Window

Context Window for RAG

Context window