Comparisons of Small Open Source AI Models (4B-40B)
Open source AI models with between 4B to 40B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Qwen3.5 27B and
Qwen3.5 35B A3B are the highest intelligence Small open source models, defined as those with 4B-40B parameters, followed by
GLM-4.7-Flash &
Apriel-v1.6-15B-Thinker.
Openness
Artificial Analysis Openness Index: Results
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index: Includes GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt evaluations spanning reasoning, knowledge, math & coding; Evaluation results measured independently by Artificial Analysis","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}
Intelligence Evaluations
While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Size
Model Size: Total and Active Parameters
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.
Intelligence vs. Active Parameters
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
Context Window
Context Window
Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.
Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context window is the maximum number of tokens a model can accept in a single request. Higher limits allow longer prompts, documents, and more complex instructions.","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://artificialanalysis.ai/docs/legal/Terms-of-Use.pdf","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}
| Weights | Provider Benchmarks | |||||||
|---|---|---|---|---|---|---|---|---|
Qwen3.5 27B (Reasoning) Alibaba | 42 | 27.8B | 262k | $0.8 | 100 | 🤗 | View | |
Qwen3.5 35B A3B (Reasoning) Alibaba | 37 | 36B | 262k | $0.7 | 178 | 🤗 | View | |
GLM-4.7-Flash (Reasoning) Z AI | 30 | 31.2B (3B active at inference time) | 200k | $0.1 | 64 | 🤗 | View | |
Apriel-v1.6-15B-Thinker ServiceNow | 28 | 15B | 128k | - | 146 | 🤗 | View | |
Qwen3 VL 32B (Reasoning) Alibaba | 25 | 33.4B | 256k | $2.6 | 89 | 🤗 | View | |
Qwen3 30B A3B 2507 (Reasoning) Alibaba | 25 | 30.5B (3.3B active at inference time) | 262k | $0.8 | 157 | 🤗 | View | |
gpt-oss-20B (high) OpenAI | 24 | 21B (3.6B active at inference time) | 131k | $0.1 | 305 | 🤗 | +7 more | View |
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) NVIDIA | 24 | 31.6B (3.6B active at inference time) | 1.00M | $0.1 | 156 | 🤗 | View | |
HyperCLOVA X SEED Think (32B) Naver | 24 | 32B | 128k | - | - | 🤗 | - | View |
Mi:dm K 2.5 Pro Korea Telecom | 23 | 32B | 128k | - | - | Not available | - | View |
Magistral Small 1.2 Mistral | 23 | 24B | 128k | $0.8 | 196 | 🤗 | View | |
EXAONE 4.0 32B (Reasoning) LG AI Research | 22 | 32B | 131k | $0.7 | 120 | 🤗 | View | |
GLM-4.7-Flash (Non-reasoning) Z AI | 22 | 31.2B (3B active at inference time) | 200k | $0.2 | 63 | 🤗 | View | |
Qwen3 VL 32B Instruct Alibaba | 21 | 33.4B | 256k | $1.2 | 68 | 🤗 | View | |
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) NVIDIA | 21 | 13.2B | 128k | $0.3 | 133 | 🤗 | View | |
Qwen3 Omni 30B A3B (Reasoning) Alibaba | 21 | 35.3B (3B active at inference time) | 65.5k | $0.4 | 99 | 🤗 | View | |
gpt-oss-20B (low) OpenAI | 21 | 21B (3.6B active at inference time) | 131k | $0.1 | 296 | 🤗 | +8 more | View |
Qwen3 VL 30B A3B Instruct Alibaba | 20 | 30B (3B active at inference time) | 256k | $0.3 | 112 | 🤗 | +1 more | View |
Qwen3 VL 30B A3B (Reasoning) Alibaba | 20 | 30B (3B active at inference time) | 256k | $0.8 | 86 | 🤗 | View | |
Qwen3 30B A3B 2507 Instruct Alibaba | 19 | 30.5B (3.3B active at inference time) | 262k | $0.3 | 81 | 🤗 | View | |
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) NVIDIA | 19 | 9B | 131k | $0.1 | 122 | 🤗 | View | |
Qwen3 Coder 30B A3B Instruct Alibaba | 17 | 30.5B (3.3B active at inference time) | 262k | $0.9 | 22 | 🤗 | +2 more | View |
Olmo 3 7B Think Allen Institute for AI | 17 | 7B | 65.5k | $0.1 | 78 | 🤗 | View | |
Devstral Small 2 Mistral | 17 | 24B | 256k | - | 213 | 🤗 | View | |
Qwen3 VL 8B (Reasoning) Alibaba | 17 | 8.77B | 256k | $0.7 | 133 | 🤗 | View | |
DeepSeek R1 0528 Qwen3 8B DeepSeek | 16 | 8.19B | 32.8k | - | - | 🤗 | - | View |
Ministral 3 14B Mistral | 16 | 14B | 256k | $0.2 | 151 | 🤗 | View | |
EXAONE 4.0 32B (Non-reasoning) LG AI Research | 16 | 32B | 131k | $0.7 | 115 | 🤗 | View | |
Qwen3 Omni 30B A3B Instruct Alibaba | 16 | 35.3B (3B active at inference time) | 65.5k | $0.4 | 101 | 🤗 | View | |
Ministral 3 8B Mistral | 15 | 8B | 256k | $0.1 | 200 | 🤗 | View | |
Ling-mini-2.0 InclusionAI | 15 | 16.3B (1.4B active at inference time) | 131k | $0.1 | 211 | 🤗 | View | |
Mistral Small 3.2 Mistral | 15 | 24B | 128k | $0.1 | 147 | 🤗 | View | |
Qwen3 VL 8B Instruct Alibaba | 15 | 8.77B | 256k | $0.3 | 136 | 🤗 | View | |
NVIDIA Nemotron Nano 9B V2 (Reasoning) NVIDIA | 15 | 9B | 131k | $0.1 | 107 | 🤗 | View | |
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) NVIDIA | 14 | 4.51B | 128k | - | - | 🤗 | - | View |
Reka Flash 3 Reka AI | 14 | 21B | 128k | $0.3 | 52 | 🤗 | View | |
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) NVIDIA | 14 | 31.6B (3.6B active at inference time) | 1.00M | $0.1 | 119 | 🤗 | View | |
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) NVIDIA | 14 | 13.2B | 128k | $0.3 | 137 | 🤗 | View | |
Granite 4.0 H Small IBM | 13 | 32B (9B active at inference time) | 128k | $0.1 | 501 | 🤗 | View | |
Phi-4 Microsoft Azure | 13 | 14B | 16.0k | $0.2 | 9 | 🤗 | View | |
Olmo 3 7B Instruct Allen Institute for AI | 13 | 7B | 65.5k | $0.1 | 44 | 🤗 | View | |
Gemma 3 12B Instruct Google | 12 | 12.2B | 128k | - | 37 | 🤗 | +2 more | View |
LFM2 8B A1B Liquid AI | 11 | 8.34B (1.5B active at inference time) | 32.8k | - | - | 🤗 | ? | View |
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) Nous Research | 11 | 24B | 32.0k | - | - | 🤗 | - | View |
Llama 3.2 Instruct 11B (Vision) Meta | 11 | 11B | 128k | $0.2 | 54 | 🤗 | View | |
Gemma 3n E4B Instruct Google | 11 | 8.39B (4B active at inference time) | 32.0k | $0.0 | 46 | 🤗 | View | |
LFM2 24B A2B Liquid AI | 10 | 23.8B (2.3B active at inference time) | 32.8k | $0.1 | 103 | 🤗 | View | |
Gemma 3 27B Instruct Google | 10 | 27.4B | 128k | - | 37 | 🤗 | +2 more | View |
Phi-4 Multimodal Instruct Microsoft Azure | 10 | 5.6B | 128k | - | 17 | 🤗 | View | |
Gemma 3n E2B Instruct Google | 10 | 5.98B (2B active at inference time) | 32.0k | - | 51 | 🤗 | View | |
Molmo 7B-D Allen Institute for AI | 9 | 8.02B | 4.10k | - | - | 🤗 | - | View |
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) Nous Research | 8 | 8B | 128k | - | - | 🤗 | - | View |
Falcon-H1R-7B TII UAE | - | 7B | 256k | - | - | Not available | - | View |
Step3 VL 10B StepFun | - | 10.2B | 65.5k | - | - | 🤗 | - | View |
Molmo2-8B Allen Institute for AI | - | 8.66B | 36.9k | - | 132 | 🤗 | View | |
Olmo 3.1 32B Instruct Allen Institute for AI | - | 32.2B | 65.5k | $0.3 | 48 | 🤗 | View | |
Olmo 3.1 32B Think Allen Institute for AI | - | 32.2B | 65.5k | - | 84 | 🤗 | View | |
Tri-21B-Think Trillion Labs | - | 21B | 32.0k | - | - | Not available | - | View |
Tri-21B-think Preview Trillion Labs | - | 21B | 32.0k | - | - | Not available | - | View |