Comparisons of Small Open Source AI Models (4B-40B)
Open source AI models with between 4B to 40B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Highlights
Openness
Artificial Analysis Openness Index: Score
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Estimate (independent evaluation forthcoming)
Reasoning models are indicated by a lightbulb icon
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA v2Updated
Agentic real-world work tasks, (Elo-500)/2000
Agentic coding & terminal use
𝜏³-BankingNew
Agentic tool use
Long context reasoning
Knowledge
1 - hallucination rate
Reasoning & knowledge
Scientific reasoning
Coding
Instruction following
Physics reasoning
Long-horizon agentic tasks
Kubernetes incident root-cause analysis
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Size
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon
Intelligence vs. Active Parameters
Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Reasoning models are indicated by a lightbulb icon.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
Cohere
Google
LG AI Research
NVIDIA
OpenAI
ServiceNow
Reasoning models are indicated by a lightbulb icon.
Context Window
Context Window
Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon
Further details
Weights | Provider Benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Qwen3.6 27B (Reasoning) | 37 | 27.8B | 262k | $0.9 | 56 | +2 | |||
Qwen3.6 35B A3B (Reasoning) | 32 | 36B 3B active at inference time | 262k | $0.4 | 169 | +6 | |||
Gemma 4 31B (Reasoning) | 29 | 30.7B | 256k | - | 34 | +8 | |||
Qwen3.6 27B (Non-reasoning) | 29 | 27.8B | 262k | $0.9 | 57 | ||||
Gemma 4 26B A4B (Reasoning) | 26 | 25.2B 3.8B active at inference time | 256k | $0.1 | - | +4 | |||
Qwen3.5 9B (Reasoning) | 25 | 9.65B | 262k | $0.1 | 57 | ||||
Gemma 4 31B (Non-reasoning) | 25 | 30.7B | 256k | $0.2 | 36 | +4 | |||
Qwen3.6 35B A3B (Non-reasoning) | 24 | 36B 3B active at inference time | 262k | $0.6 | 188 | +5 | |||
Qwen3.5 35B A3B (Non-reasoning) | 23 | 36B 3B active at inference time | 262k | $0.4 | 179 | ||||
EXAONE 4.5 33B | 23 | 34.4B | 262k | - | - | - | |||
Gemma 4 12B (Reasoning) | 22 | 12B | 256k | $0.1 | 121 | ||||
Nemotron Cascade 2 30B A3B | 21 | 31.6B 3B active at inference time | 1.00M | - | - | - | |||
North Mini Code | 21 | 30B 3B active at inference time | 256k | - | 183 | Not available | |||
Apriel-v1.6-15B-Thinker | 21 | 15B | 128k | - | - | ||||
Qwen3.5 9B (Non-reasoning) | 20 | 9.65B | 262k | - | - | - | |||
Gemma 4 26B A4B (Non-reasoning) | 20 | 25.2B 3.8B active at inference time | 256k | $0.2 | 43 | +4 | |||
Qwen3.5 4B (Reasoning) | 20 | 4.66B | 262k | $0.0 | 23 | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 18 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 47 | ||||
HyperCLOVA X SEED Think (32B) | 17 | 32B | 128k | - | - | - | |||
Qwen3.5 4B (Non-reasoning) | 16 | 4.66B | 262k | $0.0 | 22 | ||||
Nemotron 3 Nano Omni 30B A3B Reasoning | 15 | 30B 3B active at inference time | 256k | $0.1 | 289 | ||||
gpt-oss-20B (high) | 15 | 21B 3.6B active at inference time | 131k | $0.1 | 212 | +10 | |||
gpt-oss-20B (low) | 14 | 21B 3.6B active at inference time | 131k | $0.1 | 225 | +9 | |||
Tri-21B-think Preview | 14 | 21B | 32.0k | - | - | - | |||
Gemma 4 12B (Non-reasoning) | 13 | 12B | 262k | - | - | - | |||
Devstral Small 2 | 13 | 24B | 256k | - | 50 | ||||
Gemma 4 E4B (Reasoning) | 12 | 8B 4.5B active at inference time | 128k | - | - | - | |||
Tri-21B-Think | 12 | 21B | 32.0k | - | - | - | |||
Magistral Small 1.2 | 12 | 24B | 128k | $0.6 | 107 | ||||
EXAONE 4.0 32B (Reasoning) | 11 | 32B | 131k | - | - | - | |||
Ministral 3 14B | 10 | 14B | 256k | $0.2 | 93 | ||||
Falcon-H1R-7B | 10 | 7B | 256k | - | - | - | |||
Qwen3 Omni 30B A3B (Reasoning) | 10 | 35.3B 3B active at inference time | 65.5k | $0.3 | 88 | ||||
Step3 VL 10B | 9 | 10.2B | 65.5k | - | - | - | |||
Gemma 4 E2B (Reasoning) | 9 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 9 | 13.2B | 128k | $0.2 | 280 | ||||
Ministral 3 8B | 9 | 8B | 256k | $0.1 | 90 | ||||
Gemma 4 E4B (Non-reasoning) | 9 | 8B 4.5B active at inference time | 128k | - | - | - | |||
Granite 4.1 30B | 9 | 30B | 131k | - | - | - | |||
NVIDIA Nemotron Nano 9B V2 (Reasoning) | 9 | 9B | 131k | $0.1 | 73 | ||||
Olmo 3.1 32B Think | 8 | 32.2B | 65.5k | - | - | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 7 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 61 | ||||
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 7 | 9B | 131k | $0.1 | 104 | ||||
Granite 4.1 8B | 7 | 8B | 131k | $0.1 | 120 | ||||
Sarvam 30B (high) | 7 | 32.2B 2.4B active at inference time | 65.5k | $0.0 | 166 | ||||
Olmo 3.1 32B Instruct | 6 | 32.2B | 65.5k | - | - | - | |||
Gemma 4 E2B (Non-reasoning) | 6 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
EXAONE 4.0 32B (Non-reasoning) | 6 | 32B | 131k | - | - | - | |||
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 5 | 24B | 32.0k | - | - | - | |||
Granite 4.0 H Small | 5 | 32B 9B active at inference time | 128k | $0.1 | 400 | ||||
Qwen3 Omni 30B A3B Instruct | 5 | 35.3B 3B active at inference time | 65.5k | $0.3 | 95 | ||||
LFM2 24B A2B | 5 | 23.8B 2.3B active at inference time | 32.8k | $0.0 | 117 | ||||
Phi-4 | 5 | 14B | 16.0k | $0.2 | 35 | ||||
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 5 | 13.2B | 128k | $0.2 | 215 | ||||
Phi-4 Multimodal Instruct | 5 | 5.6B | 128k | - | 14 | ||||
Reka Flash 3 | 4 | 21B | 128k | $0.3 | - | ||||
Olmo 3 7B Think | 4 | 7B | 65.5k | - | - | - | |||
Molmo 7B-D | 4 | 8.02B | 4.10k | - | - | - | |||
Ling-mini-2.0 | 4 | 16.3B 1.4B active at inference time | 131k | - | - | - | |||
Llama 3.2 Instruct 11B (Vision) | 3 | 11B | 128k | $0.2 | 49 | ||||
Olmo 3 7B Instruct | 3 | 7B | 65.5k | $0.1 | - | ||||
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 2 | 8B | 128k | - | - | - | |||
Molmo2-8B | 2 | 8.66B | 36.9k | - | - | - | |||
LFM2 8B A1B | 2 | 8.34B 1.5B active at inference time | 32.8k | - | - | ||||
Apertus 8B Instruct | 1 | 8B | 65.5k | $0.1 | - | ||||
EXAONE 4.5 33B (Non-reasoning) | - | 34.4B | 262k | - | - | - |