Comparisons of Small Open Source AI Models (4B-40B)

Name: Openness
Creator: Artificial Analysis
License: https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf

Open source AI models with between 4B to 40B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Qwen3.6 27B and

Qwen3.6 35B A3B are the highest intelligence Small open source models, defined as those with 4B-40B parameters, followed by

Gemma 4 31B &

Qwen3.6 27B.

Highlights

Openness

Artificial Analysis Openness Index · Higher is better

Updated

Intelligence

Artificial Analysis Intelligence Index · Higher is better

Total Parameters

Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Score

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Reasoning models are indicated by a lightbulb icon

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR

Estimate (independent evaluation forthcoming)

Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better

GDPval-AA v2Updated

Agentic real-world work tasks, (Elo-500)/2000

Terminal-Bench v2.1New

Agentic coding & terminal use

𝜏³-BankingNew

Agentic tool use

AA-LCR

Long context reasoning

AA-Omniscience Accuracy

Knowledge

AA-Omniscience Non-Hallucination Rate

1 - hallucination rate

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

SciCode

Coding

IFBench

Instruction following

CritPt

Physics reasoning

APEX-Agents-AA

Long-horizon agentic tasks

ITBench-AA

Kubernetes incident root-cause analysis

MMMU-Pro

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

Reasoning models are indicated by a lightbulb icon

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active parameters at inference time · Artificial Analysis Intelligence Index

Most attractive quadrant

Reasoning models are indicated by a lightbulb icon.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index · Size in parameters (billions)

Most attractive quadrant

Alibaba

Cohere

Google

LG AI Research

NVIDIA

OpenAI

ServiceNow

Reasoning models are indicated by a lightbulb icon.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context window: tokens limit · Higher is better

Reasoning models are indicated by a lightbulb icon

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

							Weights		Provider Benchmarks
Qwen3.6 27B (Reasoning)	Alibaba	37	27.8B	262k	$0.9	56	🤗	+2	View
Qwen3.6 35B A3B (Reasoning)	Alibaba	32	36B 3B active at inference time	262k	$0.4	169	🤗	+6	View
Gemma 4 31B (Reasoning)	Google	29	30.7B	256k	-	34	🤗	+8	View
Qwen3.6 27B (Non-reasoning)	Alibaba	29	27.8B	262k	$0.9	57	🤗		View
Gemma 4 26B A4B (Reasoning)	Google	26	25.2B 3.8B active at inference time	256k	$0.1	-	🤗	+4	View
Qwen3.5 9B (Reasoning)	Alibaba	25	9.65B	262k	$0.1	57	🤗		View
Gemma 4 31B (Non-reasoning)	Google	25	30.7B	256k	$0.2	36	🤗	+4	View
Qwen3.6 35B A3B (Non-reasoning)	Alibaba	24	36B 3B active at inference time	262k	$0.6	188	🤗	+5	View
Qwen3.5 35B A3B (Non-reasoning)	Alibaba	23	36B 3B active at inference time	262k	$0.4	179	🤗		View
EXAONE 4.5 33B	LG AI Research	23	34.4B	262k	-	-	🤗	-	View
Gemma 4 12B (Reasoning)	Google	22	12B	256k	$0.1	121	🤗		View
Nemotron Cascade 2 30B A3B	NVIDIA	21	31.6B 3B active at inference time	1.00M	-	-	🤗	-	View
North Mini Code	Cohere	21	30B 3B active at inference time	256k	-	183	Not available		View
Apriel-v1.6-15B-Thinker	ServiceNow	21	15B	128k	-	-	🤗		View
Qwen3.5 9B (Non-reasoning)	Alibaba	20	9.65B	262k	-	-	🤗	-	View
Gemma 4 26B A4B (Non-reasoning)	Google	20	25.2B 3.8B active at inference time	256k	$0.2	43	🤗	+4	View
Qwen3.5 4B (Reasoning)	Alibaba	20	4.66B	262k	$0.0	23	🤗		View
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)	NVIDIA	18	31.6B 3.6B active at inference time	1.00M	$0.1	47	🤗		View
HyperCLOVA X SEED Think (32B)	Naver	17	32B	128k	-	-	🤗	-	View
Qwen3.5 4B (Non-reasoning)	Alibaba	16	4.66B	262k	$0.0	22	🤗		View
Nemotron 3 Nano Omni 30B A3B Reasoning	NVIDIA	15	30B 3B active at inference time	256k	$0.1	289	🤗		View
gpt-oss-20B (high)	OpenAI	15	21B 3.6B active at inference time	131k	$0.1	212	🤗	+10	View
gpt-oss-20B (low)	OpenAI	14	21B 3.6B active at inference time	131k	$0.1	225	🤗	+9	View
Tri-21B-think Preview	Trillion Labs	14	21B	32.0k	-	-	🤗	-	View
Gemma 4 12B (Non-reasoning)	Google	13	12B	262k	-	-	🤗	-	View
Devstral Small 2	Mistral	13	24B	256k	-	50	🤗		View
Gemma 4 E4B (Reasoning)	Google	12	8B 4.5B active at inference time	128k	-	-	🤗	-	View
Tri-21B-Think	Trillion Labs	12	21B	32.0k	-	-	🤗	-	View
Magistral Small 1.2	Mistral	12	24B	128k	$0.6	107	🤗		View
EXAONE 4.0 32B (Reasoning)	LG AI Research	11	32B	131k	-	-	🤗	-	View
Ministral 3 14B	Mistral	10	14B	256k	$0.2	93	🤗		View
Falcon-H1R-7B	TII UAE	10	7B	256k	-	-	🤗	-	View
Qwen3 Omni 30B A3B (Reasoning)	Alibaba	10	35.3B 3B active at inference time	65.5k	$0.3	88	🤗		View
Step3 VL 10B	StepFun	9	10.2B	65.5k	-	-	🤗	-	View
Gemma 4 E2B (Reasoning)	Google	9	5.1B 2.3B active at inference time	128k	-	-	🤗	-	View
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)	NVIDIA	9	13.2B	128k	$0.2	280	🤗		View
Ministral 3 8B	Mistral	9	8B	256k	$0.1	90	🤗		View
Gemma 4 E4B (Non-reasoning)	Google	9	8B 4.5B active at inference time	128k	-	-	🤗	-	View
Granite 4.1 30B	IBM	9	30B	131k	-	-	🤗	-	View
NVIDIA Nemotron Nano 9B V2 (Reasoning)	NVIDIA	9	9B	131k	$0.1	73	🤗		View
Olmo 3.1 32B Think	Allen Institute for AI	8	32.2B	65.5k	-	-	🤗		View
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)	NVIDIA	7	31.6B 3.6B active at inference time	1.00M	$0.1	61	🤗		View
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)	NVIDIA	7	9B	131k	$0.1	104	🤗		View
Granite 4.1 8B	IBM	7	8B	131k	$0.1	120	🤗		View
Sarvam 30B (high)	Sarvam	7	32.2B 2.4B active at inference time	65.5k	$0.0	166	🤗		View
Olmo 3.1 32B Instruct	Allen Institute for AI	6	32.2B	65.5k	-	-	🤗	-	View
Gemma 4 E2B (Non-reasoning)	Google	6	5.1B 2.3B active at inference time	128k	-	-	🤗	-	View
EXAONE 4.0 32B (Non-reasoning)	LG AI Research	6	32B	131k	-	-	🤗	-	View
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)	Nous Research	5	24B	32.0k	-	-	🤗	-	View
Granite 4.0 H Small	IBM	5	32B 9B active at inference time	128k	$0.1	400	🤗		View
Qwen3 Omni 30B A3B Instruct	Alibaba	5	35.3B 3B active at inference time	65.5k	$0.3	95	🤗		View
LFM2 24B A2B	Liquid AI	5	23.8B 2.3B active at inference time	32.8k	$0.0	117	🤗		View
Phi-4	Microsoft	5	14B	16.0k	$0.2	35	🤗		View
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)	NVIDIA	5	13.2B	128k	$0.2	215	🤗		View
Phi-4 Multimodal Instruct	Microsoft	5	5.6B	128k	-	14	🤗		View
Reka Flash 3	Reka AI	4	21B	128k	$0.3	-	🤗		View
Olmo 3 7B Think	Allen Institute for AI	4	7B	65.5k	-	-	🤗	-	View
Molmo 7B-D	Allen Institute for AI	4	8.02B	4.10k	-	-	🤗	-	View
Ling-mini-2.0	InclusionAI	4	16.3B 1.4B active at inference time	131k	-	-	🤗	-	View
Llama 3.2 Instruct 11B (Vision)	Meta	3	11B	128k	$0.2	49	🤗		View
Olmo 3 7B Instruct	Allen Institute for AI	3	7B	65.5k	$0.1	-	🤗		View
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)	Nous Research	2	8B	128k	-	-	🤗	-	View
Molmo2-8B	Allen Institute for AI	2	8.66B	36.9k	-	-	🤗	-	View
LFM2 8B A1B	Liquid AI	2	8.34B 1.5B active at inference time	32.8k	-	-	🤗		View
Apertus 8B Instruct	Swiss AI Initiative	1	8B	65.5k	$0.1	-	🤗		View
EXAONE 4.5 33B (Non-reasoning)	LG AI Research	-	34.4B	262k	-	-	🤗	-	View

Comparisons of Small Open Source AI Models (4B-40B)

Openness

Intelligence

Total Parameters

Openness

Artificial Analysis Openness Index: Score

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index

Intelligence Evaluations

Intelligence Evaluation Relevance

Artificial Analysis Intelligence Index

Size

Model Size: Total and Active Parameters

Total Parameters

Active Parameters at Inference Time

Intelligence vs. Active Parameters

Artificial Analysis Intelligence Index

Active Parameters at Inference Time

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Context Window

Context Window

Context Window for RAG

Context Window

Further details