Comparison of Open Source Models
Comparison and analysis of open source AI models across key performance metrics including quality, performance, inference speed, context window, parameter count & licensing details. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details relating to our methodology, see our FAQs.
Kimi K2.6 and Highlights
Openness
Artificial Analysis Openness Index: Results
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon
Open Source Progress
Progress in Open Weights vs. Proprietary Intelligence
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon.
Open Source Language Models Intelligence By Lab Over Time
Reasoning models are indicated by a lightbulb icon.
Open Source Models Intelligence By Size Over Time
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Reasoning models are indicated by a lightbulb icon.
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)
Reasoning models are indicated by a lightbulb icon
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA
Agentic real-world work tasks, (Elo-500)/2000
Terminal-Bench Hard
Agentic coding & terminal use
𝜏²-Bench Telecom
Agentic tool use
AA-LCR
Long context reasoning
AA-Omniscience Accuracy
Knowledge
AA-Omniscience Non-Hallucination Rate
1 - hallucination rate
Humanity's Last Exam
Reasoning & knowledge
GPQA Diamond
Scientific reasoning
SciCode
Coding
IFBench
Instruction following
CritPt
Physics reasoning
APEX-Agents-AA
Long-horizon agentic tasks
ITBench-AA
Kubernetes incident root-cause analysis
MMMU-Pro
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Size
Intelligence Index By Model Size
Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
Estimate (independent evaluation forthcoming)
Large Models (>150B)
Medium Models (40B-150B)
Small Models (4B-40B)
Reasoning models are indicated by a lightbulb icon.
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon
Intelligence vs. Active Parameters
Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
DeepSeek
Google
Kimi
MBZUAI Institute of Foundation Models
MiniMax
Mistral
NVIDIA
OpenAI
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
DeepSeek
Google
Kimi
MBZUAI Institute of Foundation Models
MiniMax
Mistral
NVIDIA
OpenAI
Xiaomi
Z AI
Reasoning models are indicated by a lightbulb icon.
Context Window
Context Window
Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon
Further details
Weights | Provider Benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Kimi K2.6 | 54 | 1.0KB 32B active at inference time | 256k | $0.7 | 44 | +12 | |||
MiMo-V2.5-Pro | 54 | 1.0KB 42B active at inference time | 1.00M | $0.2 | 45 | ||||
DeepSeek V4 Pro (Reasoning, Max Effort) | 52 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 58 | +8 | |||
GLM-5.1 (Reasoning) | 51 | 744B 40B active at inference time | 200k | $0.9 | 77 | +9 | |||
DeepSeek V4 Pro (Reasoning, High Effort) | 50 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 58 | +8 | |||
GLM-5 (Reasoning) | 50 | 744B 40B active at inference time | 200k | $0.7 | 72 | +9 | |||
MiniMax-M2.7 | 50 | 230B 10B active at inference time | 205k | $0.2 | 120 | +3 | |||
MiMo-V2.5 | 49 | 310B 15B active at inference time | 1.00M | $0.1 | 76 | +2 | |||
Nemotron 3 Ultra 550B A55B (Reasoning) | 48 | 550B 55B active at inference time | 262k | $0.5 | 142 | Not available | +4 | ||
Kimi K2.5 (Reasoning) | 47 | 1.0KB 32B active at inference time | 256k | $0.6 | 48 | +12 | |||
DeepSeek V4 Flash (Reasoning, Max Effort) | 47 | 284B 13B active at inference time | 1.00M | $0.1 | 124 | +4 | |||
DeepSeek V4 Flash (Reasoning, High Effort) | 46 | 284B 13B active at inference time | 1.00M | $0.1 | - | +4 | |||
Qwen3.6 27B (Reasoning) | 46 | 27.8B | 262k | $0.9 | 63 | +2 | |||
Qwen3.5 397B A17B (Reasoning) | 45 | 397B 17B active at inference time | 262k | $0.9 | 53 | +9 | |||
GLM-5.1 (Non-reasoning) | 44 | 744B 40B active at inference time | 200k | $0.9 | 80 | +5 | |||
Qwen3.6 35B A3B (Reasoning) | 43 | 36B 3B active at inference time | 262k | $0.4 | 189 | +6 | |||
Kimi K2.6 (Non-reasoning) | 43 | 1.0KB 32B active at inference time | 256k | $0.7 | 37 | +9 | |||
Step 3.7 Flash | 43 | 198B 11B active at inference time | 256k | $0.2 | 174 | ||||
GLM-4.7 (Reasoning) | 42 | 357B 32B active at inference time | 200k | $0.7 | 84 | +7 | |||
Qwen3.5 27B (Reasoning) | 42 | 27.8B | 262k | $0.5 | 82 | +3 | |||
MiniMax-M2.5 | 42 | 230B 10B active at inference time | 205k | $0.3 | 215 | +13 | |||
Hy3-preview (Reasoning) | 42 | 295B 21B active at inference time | 256k | $0.1 | 101 | ||||
DeepSeek V3.2 (Reasoning) | 42 | 685B 37B active at inference time | 128k | $0.2 | - | ? ? +12 | |||
Qwen3.5 122B A10B (Reasoning) | 42 | 125B 10B active at inference time | 262k | $0.7 | 140 | +2 | |||
MiMo-V2-Flash (Feb 2026) | 41 | 309B 15B active at inference time | 256k | $0.1 | 125 | ||||
Kimi K2 Thinking | 41 | 1.0KB 32B active at inference time | 256k | $0.8 | 123 | +3 | |||
GLM-5 (Non-reasoning) | 41 | 744B 40B active at inference time | 200k | $0.7 | 67 | +3 | |||
Qwen3.5 397B A17B (Non-reasoning) | 40 | 397B 17B active at inference time | 262k | $0.9 | 54 | +6 | |||
MiniMax-M2.1 | 39 | 230B 10B active at inference time | 205k | $0.4 | 212 | ||||
DeepSeek V4 Pro (Non-reasoning) | 39 | 1.6KB 49B active at inference time | 1.00M | $0.2 | 61 | +2 | |||
MiMo-V2-Flash (Reasoning) | 39 | 309B 15B active at inference time | 256k | $0.1 | 131 | ||||
Mistral Medium 3.5 | 39 | 128B | 256k | $2.1 | 122 | ||||
Gemma 4 31B (Reasoning) | 39 | 30.7B | 256k | - | 36 | +8 | |||
Ring-2.6-1T | 38 | 1.0KB 63B active at inference time | 262k | $0.5 | 127 | ||||
Step 3.5 Flash | 38 | 196B 11B active at inference time | 256k | $0.1 | 179 | ||||
Kimi K2.5 (Non-reasoning) | 37 | 1.0KB 32B active at inference time | 256k | $0.8 | 42 | +6 | |||
Qwen3.5 27B (Non-reasoning) | 37 | 27.8B | 262k | $0.5 | 91 | ||||
Command A+ | 37 | 218B 25B active at inference time | 192k | - | 196 | ||||
Qwen3.6 27B (Non-reasoning) | 37 | 27.8B | 262k | $0.9 | 64 | ||||
Qwen3.5 35B A3B (Reasoning) | 37 | 36B 3B active at inference time | 262k | $0.4 | 153 | +2 | |||
DeepSeek V4 Flash (Non-reasoning) | 36 | 284B 13B active at inference time | 1.00M | $0.1 | 117 | ||||
MiniMax-M2 | 36 | 230B 10B active at inference time | 205k | $0.4 | 127 | ||||
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 36 | 120.6B 12.7B active at inference time | 1.00M | $0.3 | 159 | +2 | |||
Qwen3.5 122B A10B (Non-reasoning) | 36 | 125B 10B active at inference time | 262k | $0.7 | 163 | ||||
MiMo-V2.5-Pro (Non-reasoning) | 36 | 1.0KB 41.7B active at inference time | 1.00M | $0.6 | 54 | ||||
GLM-4.7 (Non-reasoning) | 34 | 357B 32B active at inference time | 200k | $0.7 | 76 | +6 | |||
DeepSeek V3.1 Terminus (Reasoning) | 34 | 685B 37B active at inference time | 128k | $1.7 | - | ||||
Hy3-preview (Non-reasoning) | 34 | 295B 21B active at inference time | 256k | $0.1 | 94 | ||||
Ling-2.6-1T | 34 | 1.0KB 63B active at inference time | 262k | $0.5 | - | ||||
gpt-oss-120b (high) | 33 | 117B 5.1B active at inference time | 131k | $0.2 | 341 | +23 | |||
DeepSeek V3.2 Exp (Reasoning) | 33 | 685B 37B active at inference time | 128k | $0.2 | - | ||||
GLM-4.6 (Reasoning) | 33 | 357B 32B active at inference time | 200k | $0.7 | 51 | ||||
Qwen3.5 9B (Reasoning) | 32 | 9.65B | 262k | $0.1 | 86 | ||||
Gemma 4 31B (Non-reasoning) | 32 | 30.7B | 256k | $0.2 | 45 | +4 | |||
K-EXAONE (Reasoning) | 32 | 236B 23B active at inference time | 256k | - | - | - | |||
DeepSeek V3.2 (Non-reasoning) | 32 | 685B 37B active at inference time | 128k | $0.5 | - | +12 | |||
Trinity Large Thinking | 32 | 399B 13B active at inference time | 512k | $0.2 | 169 | ||||
Qwen3.6 35B A3B (Non-reasoning) | 32 | 36B 3B active at inference time | 262k | $0.6 | 220 | +5 | |||
Gemma 4 26B A4B (Reasoning) | 31 | 25.2B 3.8B active at inference time | 256k | $0.1 | - | +4 | |||
Kimi K2 0905 | 31 | 1.0KB 32B active at inference time | 256k | $0.8 | 25 | ||||
Qwen3.5 35B A3B (Non-reasoning) | 31 | 36B 3B active at inference time | 262k | $0.4 | 178 | ||||
MiMo-V2-Flash (Non-reasoning) | 30 | 309B 15B active at inference time | 256k | $0.1 | 122 | ||||
GLM-4.6 (Non-reasoning) | 30 | 357B 32B active at inference time | 200k | $0.8 | 55 | ||||
EXAONE 4.5 33B | 30 | 34.4B | 262k | - | - | - | |||
GLM-4.7-Flash (Reasoning) | 30 | 31.2B 3B active at inference time | 200k | $0.1 | 79 | ||||
Qwen3 235B A22B 2507 (Reasoning) | 30 | 235B 22B active at inference time | 256k | $0.6 | 52 | +3 | |||
DeepSeek V3.2 Speciale | 29 | 685B 37B active at inference time | 128k | - | - | - | |||
DeepSeek V3.1 Terminus (Non-reasoning) | 29 | 685B 37B active at inference time | 128k | $0.3 | - | ||||
DeepSeek V3.2 Exp (Non-reasoning) | 28 | 685B 37B active at inference time | 128k | $0.2 | - | ||||
Nemotron Cascade 2 30B A3B | 28 | 31.6B 3B active at inference time | 1.00M | - | - | - | |||
Apriel-v1.5-15B-Thinker | 28 | 15B | 128k | - | - | ||||
Qwen3 Coder Next | 28 | 79.7B 3B active at inference time | 256k | $0.4 | 103 | ||||
DeepSeek V3.1 (Non-reasoning) | 28 | 685B 37B active at inference time | 128k | $0.7 | - | +7 | |||
Mistral Small 4 (Reasoning) | 28 | 119B 6.5B active at inference time | 256k | $0.2 | 177 | ||||
DeepSeek V3.1 (Reasoning) | 28 | 685B 37B active at inference time | 128k | $0.7 | - | ||||
Qwen3 VL 235B A22B (Reasoning) | 28 | 235B 22B active at inference time | 262k | $1.4 | 36 | ||||
Apriel-v1.6-15B-Thinker | 28 | 15B | 128k | - | - | ||||
Qwen3.5 9B (Non-reasoning) | 27 | 9.65B | 262k | - | - | - | |||
Gemma 4 26B A4B (Non-reasoning) | 27 | 25.2B 3.8B active at inference time | 256k | $0.2 | 72 | +4 | |||
Qwen3.5 4B (Reasoning) | 27 | 4.66B | 262k | $0.0 | 205 | ||||
DeepSeek R1 0528 (May '25) | 27 | 685B 37B active at inference time | 128k | $1.6 | - | +3 | |||
Qwen3 Next 80B A3B (Reasoning) | 27 | 80B 3B active at inference time | 262k | $1.1 | 158 | +5 | |||
GLM-4.5 (Reasoning) | 26 | 355B 32B active at inference time | 128k | $0.8 | 51 | ||||
Kimi K2 | 26 | 1.0KB 32B active at inference time | 128k | $0.6 | 24 | ||||
Ling 2.6 Flash | 26 | 107B 7.4B active at inference time | 262k | $0.1 | - | ||||
Seed-OSS-36B-Instruct | 25 | 36.2B | 512k | $0.2 | 36 | ||||
Qwen3 235B A22B 2507 Instruct | 25 | 235B 22B active at inference time | 256k | $0.3 | 50 | +9 | |||
Qwen3 Coder 480B A35B Instruct | 25 | 480B 35B active at inference time | 262k | $0.5 | 57 | +6 | |||
Qwen3 VL 32B (Reasoning) | 25 | 33.4B | 256k | $1.5 | 88 | ||||
gpt-oss-20B (high) | 24 | 21B 3.6B active at inference time | 131k | $0.1 | 239 | +10 | |||
gpt-oss-120b (low) | 24 | 117B 5.1B active at inference time | 131k | $0.2 | 343 | +19 | |||
MiniMax M1 80k | 24 | 456B 45.9B active at inference time | 1.00M | $0.7 | - | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 24 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 181 | ||||
K2 Think V2 | 24 | 70B | 262k | - | - | - | |||
LongCat Flash Lite | 24 | 68.5B 3B active at inference time | 256k | - | - | ||||
HyperCLOVA X SEED Think (32B) | 24 | 32B | 128k | - | - | - | |||
GLM-4.6V (Reasoning) | 23 | 108B 12B active at inference time | 128k | $0.4 | 85 | ||||
K-EXAONE (Non-reasoning) | 23 | 236B 23B active at inference time | 256k | - | - | - | |||
GLM-4.5-Air | 23 | 106B 12B active at inference time | 128k | $0.3 | 79 | ||||
Mistral Large 3 | 23 | 675B 41B active at inference time | 256k | $0.6 | 52 | ||||
Ring-1T | 23 | 1.0KB 50B active at inference time | 128k | - | - | - | |||
Qwen3.5 4B (Non-reasoning) | 23 | 4.66B | 262k | $0.0 | 215 | ||||
Qwen3 30B A3B 2507 (Reasoning) | 22 | 30.5B 3.3B active at inference time | 262k | $0.4 | 123 | ||||
DeepSeek V3 0324 | 22 | 671B 37B active at inference time | 128k | $1.2 | - | +3 | |||
INTELLECT-3 | 22 | 107B 12B active at inference time | 131k | - | - | - | |||
GLM-4.7-Flash (Non-reasoning) | 22 | 31.2B 3B active at inference time | 200k | $0.1 | 107 | ||||
Devstral 2 | 22 | 125B | 256k | - | 65 | ||||
Solar Open 100B (Reasoning) | 22 | 102B 12B active at inference time | 128k | - | - | - | |||
Nemotron 3 Nano Omni 30B A3B Reasoning | 21 | 30B 3B active at inference time | 256k | $0.1 | 301 | ||||
MiniMax M1 40k | 21 | 456B 45.9B active at inference time | 1.00M | - | - | - | |||
gpt-oss-20B (low) | 21 | 21B 3.6B active at inference time | 131k | $0.1 | 242 | +9 | |||
Qwen3 VL 235B A22B Instruct | 21 | 235B 22B active at inference time | 262k | $0.5 | 49 | +2 | |||
K2-V2 (high) | 21 | 70B | 512k | - | - | - | |||
Qwen3 Next 80B A3B Instruct | 20 | 80B 3B active at inference time | 262k | $0.7 | 150 | +4 | |||
Tri-21B-think Preview | 20 | 21B | 32.0k | - | - | - | |||
Qwen3 Coder 30B A3B Instruct | 20 | 30.5B 3.3B active at inference time | 262k | $0.3 | 94 | ||||
Qwen3 235B A22B (Reasoning) | 20 | 235B 22B active at inference time | 32.8k | $1.5 | 51 | ||||
QwQ 32B | 20 | 32.8B | 131k | $0.7 | 29 | ||||
Qwen3 VL 30B A3B (Reasoning) | 20 | 30B 3B active at inference time | 256k | $0.3 | 100 | ||||
Devstral Small 2 | 19 | 24B | 256k | - | 57 | ||||
Ling-1T | 19 | 1.0KB 50B active at inference time | 128k | - | - | - | |||
DeepSeek R1 (Jan '25) | 19 | 685B 37B active at inference time | 128k | $2.0 | - | +3 | |||
Gemma 4 E4B (Reasoning) | 19 | 8B 4.5B active at inference time | 128k | - | - | - | |||
K2-V2 (medium) | 19 | 70B | 512k | - | - | - | |||
Llama Nemotron Super 49B v1.5 (Reasoning) | 19 | 49B | 128k | $0.1 | 45 | ||||
Mistral Small 4 (Non-reasoning) | 19 | 119B 6.5B active at inference time | 256k | $0.2 | 169 | ||||
Tri-21B-Think | 19 | 21B | 32.0k | - | - | - | |||
Hermes 4 - Llama-3.1 405B (Reasoning) | 19 | 406B | 128k | $1.2 | 38 | ||||
Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 18 | 49B | 128k | - | - | - | |||
Llama 4 Maverick | 18 | 402B 17B active at inference time | 1.00M | $0.3 | 110 | +6 | |||
Qwen3 4B 2507 (Reasoning) | 18 | 4.02B | 262k | - | - | - | |||
MiniCPM5-1B (Reasoning) | 18 | 1B | 128k | - | - | - | |||
Magistral Small 1.2 | 18 | 24B | 128k | $0.6 | 110 | ||||
Sarvam 105B (high) | 18 | 106B 10.3B active at inference time | 128k | $0.0 | 114 | ||||
Devstral Small (May '25) | 18 | 23.6B | 256k | - | - | - | |||
MiniCPM5-1B (Non-reasoning) | 18 | 1B | 128k | - | - | - | |||
Hermes 4 - Llama-3.1 405B (Non-reasoning) | 18 | 406B | 128k | $1.2 | 40 | ||||
Llama 3.1 Instruct 405B | 17 | 405B | 128k | $3.1 | 47 | ||||
Qwen3 VL 32B Instruct | 17 | 33.4B | 256k | $0.9 | 68 | ||||
DeepSeek R1 Distill Qwen 32B | 17 | 32B | 128k | - | - | - | |||
GLM-4.6V (Non-reasoning) | 17 | 108B 12B active at inference time | 128k | $0.4 | 94 | ||||
Qwen3 235B A22B (Non-reasoning) | 17 | 235B 22B active at inference time | 32.8k | $0.6 | 46 | ||||
Magistral Small 1 | 17 | 23.6B | 40.0k | - | - | - | |||
EXAONE 4.0 32B (Reasoning) | 17 | 32B | 131k | - | - | - | |||
Qwen3 VL 8B (Reasoning) | 17 | 8.77B | 256k | $0.4 | 108 | ||||
Qwen3 32B (Reasoning) | 17 | 32.8B | 32.8k | $0.2 | 81 | +3 | |||
DeepSeek V3 (Dec '24) | 16 | 671B 37B active at inference time | 128k | $0.4 | - | +2 | |||
DeepSeek R1 0528 Qwen3 8B | 16 | 8.19B | 32.8k | - | - | - | |||
Qwen3.5 2B (Reasoning) | 16 | 2.27B | 262k | $0.0 | - | ||||
Qwen3 14B (Reasoning) | 16 | 14.8B | 32.8k | $0.4 | 60 | ||||
Nanbeige4.1-3B | 16 | 3.93B | 256k | - | - | - | |||
Qwen3 VL 30B A3B Instruct | 16 | 30B 3B active at inference time | 256k | $0.2 | 107 | ||||
Hermes 4 - Llama-3.1 70B (Reasoning) | 16 | 70.6B | 128k | $0.2 | 82 | ||||
Ministral 3 14B | 16 | 14B | 256k | $0.2 | 87 | ||||
DeepSeek R1 Distill Llama 70B | 16 | 70B | 128k | $0.7 | 42 | ||||
DeepSeek R1 Distill Qwen 14B | 16 | 14B | 128k | - | - | - | |||
Falcon-H1R-7B | 16 | 7B | 256k | - | - | - | |||
Ling-flash-2.0 | 16 | 103B 6.1B active at inference time | 128k | $0.2 | 78 | ||||
Qwen3 Omni 30B A3B (Reasoning) | 16 | 35.3B 3B active at inference time | 65.5k | $0.3 | 75 | ||||
Qwen2.5 Instruct 72B | 16 | 72B | 131k | $0.2 | - | ||||
Step3 VL 10B | 15 | 10.2B | 65.5k | - | - | - | |||
Qwen3 30B A3B (Reasoning) | 15 | 30.5B 3.3B active at inference time | 32.8k | $0.1 | 65 | +2 | |||
Devstral Small (Jul '25) | 15 | 24B | 256k | $0.1 | 56 | ||||
Gemma 4 E2B (Reasoning) | 15 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
QwQ 32B-Preview | 15 | 32.8B | 32.8k | - | - | - | |||
GLM-4.5V (Reasoning) | 15 | 108B 12B active at inference time | 64.0k | $0.7 | 26 | ||||
Mistral Large 2 (Nov '24) | 15 | 123B | 128k | $2.4 | 50 | ||||
Mistral Small 3.2 | 15 | 24B | 128k | $0.1 | 141 | ||||
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 15 | 253B | 128k | $0.7 | 52 | ||||
Qwen3 30B A3B 2507 Instruct | 15 | 30.5B 3.3B active at inference time | 262k | $0.2 | 97 | ||||
ERNIE 4.5 300B A47B | 15 | 300B 47B active at inference time | 131k | $0.4 | - | ||||
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 15 | 13.2B | 128k | $0.2 | 296 | ||||
Ministral 3 8B | 15 | 8B | 256k | $0.1 | 107 | ||||
Gemma 4 E4B (Non-reasoning) | 15 | 8B 4.5B active at inference time | 128k | - | - | - | |||
NVIDIA Nemotron Nano 9B V2 (Reasoning) | 15 | 9B | 131k | $0.1 | 119 | ||||
Granite 4.1 30B | 15 | 30B | 131k | - | - | - | |||
NVIDIA Nemotron 3 Nano 4B | 15 | 3.97B | 262k | - | - | - | |||
Qwen3.5 2B (Non-reasoning) | 15 | 2.27B | 262k | $0.0 | 366 | ||||
Llama Nemotron Super 49B v1.5 (Non-reasoning) | 15 | 49B | 128k | $0.1 | 46 | ||||
Qwen3 32B (Non-reasoning) | 15 | 32.8B | 32.8k | $0.2 | 85 | +4 | |||
Llama 3.3 Instruct 70B | 14 | 70B | 128k | $0.6 | 79 | +18 | |||
Mistral Small 3.1 | 14 | 24B | 128k | $0.1 | 170 | ||||
K2-V2 (low) | 14 | 70B | 512k | - | - | - | |||
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 14 | 4.51B | 128k | - | - | - | |||
Kimi Linear 48B A3B Instruct | 14 | 49.1B 3B active at inference time | 1.00M | - | - | - | |||
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 14 | 49B | 128k | - | - | - | |||
Qwen3 VL 8B Instruct | 14 | 8.77B | 256k | $0.2 | 119 | ||||
Qwen3 4B (Reasoning) | 14 | 4.02B | 32.0k | $0.2 | - | ||||
Llama 3.1 Tulu3 405B | 14 | 405B | 128k | - | - | - | |||
Ring-flash-2.0 | 14 | 103B 6.1B active at inference time | 128k | $0.2 | - | ||||
Pixtral Large | 14 | 124B | 128k | $2.4 | 53 | ||||
Olmo 3.1 32B Think | 14 | 32.2B | 65.5k | - | - | ||||
Grok 2 (Dec '24) | 14 | 270B | 131k | - | - | - | |||
Qwen3 VL 4B (Reasoning) | 14 | 4.44B | 256k | - | - | - | |||
Llama 4 Scout | 14 | 109B 17B active at inference time | 10.0M | $0.2 | 106 | +6 | |||
Command A | 13 | 111B | 256k | $3.3 | 69 | ||||
Llama 3.1 Nemotron Instruct 70B | 13 | 70B | 128k | $1.2 | 303 | ||||
Qwen2.5 Instruct 32B | 13 | 32B | 128k | - | - | - | |||
Qwen3 8B (Reasoning) | 13 | 8.19B | 131k | $0.2 | 36 | ||||
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 13 | 31.6B 3.6B active at inference time | 1.00M | $0.1 | 91 | ||||
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 13 | 9B | 131k | $0.1 | 140 | ||||
Mistral Large 2 (Jul '24) | 13 | 123B | 128k | $2.4 | - | ||||
Qwen3 4B 2507 Instruct | 13 | 4.02B | 262k | - | - | - | |||
Qwen2.5 Coder Instruct 32B | 13 | 32B | 131k | - | - | - | |||
Qwen3 14B (Non-reasoning) | 13 | 14.8B | 32.8k | $0.3 | 59 | ||||
GLM-4.5V (Non-reasoning) | 13 | 108B 12B active at inference time | 64.0k | $0.7 | 34 | ||||
Mistral Small 3 | 13 | 24B | 32.0k | $0.1 | 164 | ||||
MiniCPM-V 4.6 1.3B | 13 | 1.3B | 262k | - | - | - | |||
Hermes 4 - Llama-3.1 70B (Non-reasoning) | 13 | 70.6B | 128k | $0.2 | 89 | ||||
Qwen3 30B A3B (Non-reasoning) | 13 | 30.5B 3.3B active at inference time | 32.8k | $0.1 | 66 | ||||
DeepSeek-V2.5 (Dec '24) | 13 | 236B 21B active at inference time | 128k | - | - | - | |||
Qwen3 4B (Non-reasoning) | 12 | 4.02B | 32.0k | $0.1 | - | ||||
Llama 3.1 Instruct 70B | 12 | 70B | 128k | $0.6 | 36 | ||||
Granite 4.1 8B | 12 | 8B | 131k | $0.1 | 120 | ||||
Sarvam 30B (high) | 12 | 32.2B 2.4B active at inference time | 65.5k | $0.0 | 171 | ||||
DeepSeek-V2.5 | 12 | 236B 21B active at inference time | 128k | - | - | - | |||
Olmo 3.1 32B Instruct | 12 | 32.2B | 65.5k | - | - | - | |||
DeepSeek R1 Distill Llama 8B | 12 | 8B | 128k | - | - | - | |||
Gemma 4 E2B (Non-reasoning) | 12 | 5.1B 2.3B active at inference time | 128k | - | - | - | |||
Olmo 3 32B Think | 12 | 32.2B | 65.5k | - | - | - | |||
R1 1776 | 12 | 671B 37B active at inference time | 128k | - | - | - | |||
Llama 3.2 Instruct 90B (Vision) | 12 | 90B | 128k | $1.4 | 58 | ||||
Solar Mini | 12 | 10.7B | 4.10k | $0.1 | - | ||||
Llama 3.1 Instruct 8B | 12 | 8B | 128k | $0.1 | 154 | +12 | |||
Grok-1 | 12 | 314B 78B active at inference time | 8.19k | - | - | - | |||
Qwen2 Instruct 72B | 12 | 72B | 131k | - | - | - | |||
EXAONE 4.0 32B (Non-reasoning) | 12 | 32B | 131k | - | - | - | |||
Ministral 3 3B | 11 | 3B | 256k | $0.1 | 181 | ||||
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 11 | 24B | 32.0k | - | - | - | |||
Jamba 1.7 Large | 11 | 398B 94B active at inference time | 256k | $2.6 | 60 | ||||
Granite 4.0 H Small | 11 | 32B 9B active at inference time | 128k | $0.1 | 417 | ||||
Jamba 1.5 Large | 11 | 398B 94B active at inference time | 256k | $2.6 | - | ||||
Qwen3 Omni 30B A3B Instruct | 11 | 35.3B 3B active at inference time | 65.5k | $0.3 | 92 | ||||
Hermes 3 - Llama-3.1 70B | 11 | 70.6B | 128k | $0.3 | 26 | ||||
Qwen3 8B (Non-reasoning) | 11 | 8.19B | 32.8k | $0.2 | 38 | ||||
DeepSeek-Coder-V2 | 11 | 236B 21B active at inference time | 128k | - | - | - | |||
OLMo 2 32B | 11 | 32.2B | 4.10k | - | - | - | |||
Jamba 1.6 Large | 11 | 398B 94B active at inference time | 256k | $2.6 | 61 | ||||
Qwen3.5 0.8B (Reasoning) | 11 | 0.873B | 262k | $0.0 | - | ||||
LFM2 24B A2B | 10 | 23.8B 2.3B active at inference time | 32.8k | $0.0 | 118 | ||||
Phi-4 | 10 | 14B | 16.0k | $0.2 | 31 | ||||
Gemma 3 27B Instruct | 10 | 27.4B | 128k | $0.1 | - | +3 | |||
Mistral Small (Sep '24) | 10 | 22B | 32.8k | $0.2 | 170 | ||||
Phi-3 Mini Instruct 3.8B | 10 | 3.8B | 4.10k | - | - | - | |||
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 10 | 13.2B | 128k | $0.2 | 230 | ||||
Gemma 3n E4B Instruct Preview (May '25) | 10 | 8.39B 4B active at inference time | 32.0k | - | - | - | |||
Phi-4 Multimodal Instruct | 10 | 5.6B | 128k | - | 17 | ||||
Qwen2.5 Coder Instruct 7B | 10 | 7.62B | 131k | - | - | - | |||
Qwen3.5 0.8B (Non-reasoning) | 10 | 0.873B | 262k | $0.0 | 85 | ||||
Mixtral 8x22B Instruct | 10 | 141B 39B active at inference time | 65.4k | - | - | - | |||
Llama 2 Chat 7B | 10 | 7B | 4.10k | $0.1 | - | ||||
Llama 3.2 Instruct 3B | 10 | 3B | 128k | $0.1 | 51 | ||||
Jamba Reasoning 3B | 10 | 3B | 262k | - | - | - | |||
Qwen3 VL 4B Instruct | 10 | 4.44B | 256k | - | - | - | |||
Qwen1.5 Chat 110B | 10 | 110B | 32.0k | - | - | - | |||
Reka Flash 3 | 10 | 21B | 128k | $0.3 | - | ||||
Olmo 3 7B Think | 9 | 7B | 65.5k | - | - | - | |||
OLMo 2 7B | 9 | 7.3B | 4.10k | - | - | - | |||
Molmo 7B-D | 9 | 8.02B | 4.10k | - | - | - | |||
Ling-mini-2.0 | 9 | 16.3B 1.4B active at inference time | 131k | - | - | - | |||
DeepSeek R1 Distill Qwen 1.5B | 9 | 1.5B | 128k | - | - | - | |||
DeepSeek-V2-Chat | 9 | 236B 21B active at inference time | 128k | - | - | - | |||
Llama 3 Instruct 70B | 9 | 70B | 8.19k | $0.9 | - | ||||
Arctic Instruct | 9 | 480B 17B active at inference time | 4.00k | - | - | - | |||
Qwen Chat 72B | 9 | 72B | 33.8k | - | - | - | |||
Gemma 3 12B Instruct | 9 | 12.2B | 128k | $0.1 | - | +2 | |||
Llama 3.2 Instruct 11B (Vision) | 9 | 11B | 128k | $0.2 | 51 | ||||
Granite 4.1 3B | 9 | 3B | 131k | - | - | - | |||
DeepSeek Coder V2 Lite Instruct | 8 | 16B 2.4B active at inference time | 128k | - | - | - | |||
Sarvam M (Reasoning) | 8 | 23.6B | 32.8k | - | - | ||||
Phi-4 Mini Instruct | 8 | 3.84B | 128k | - | 21 | ||||
Llama 2 Chat 70B | 8 | 70B | 4.10k | - | - | - | |||
DeepSeek LLM 67B Chat (V1) | 8 | 7B | 4.10k | - | - | - | |||
Llama 2 Chat 13B | 8 | 13B | 4.10k | - | - | - | |||
Command-R+ (Apr '24) | 8 | 104B | 128k | $4.2 | - | ||||
OpenChat 3.5 (1210) | 8 | 7B | 8.19k | - | - | - | |||
DBRX Instruct | 8 | 132B 36B active at inference time | 32.8k | - | - | - | |||
Exaone 4.0 1.2B (Reasoning) | 8 | 1.28B | 64.0k | - | - | - | |||
Olmo 3 7B Instruct | 8 | 7B | 65.5k | $0.1 | - | ||||
Exaone 4.0 1.2B (Non-reasoning) | 8 | 1.28B | 64.0k | - | - | - | |||
LFM2.5-1.2B-Thinking | 8 | 1.17B | 32.0k | - | - | - | |||
Jamba 1.7 Mini | 8 | 52B 12B active at inference time | 258k | - | - | - | |||
LFM2 2.6B | 8 | 2.57B | 32.8k | - | - | ? | |||
LFM2.5-1.2B-Instruct | 8 | 1.17B | 32.0k | - | - | ? | |||
Jamba 1.5 Mini | 8 | 52B 12B active at inference time | 256k | $0.2 | - | ||||
Granite 4.0 H 1B | 8 | 1.5B | 128k | - | - | - | |||
Qwen3 1.7B (Reasoning) | 8 | 2.03B | 32.0k | $0.2 | - | ||||
Jamba 1.6 Mini | 8 | 52B 12B active at inference time | 256k | $0.2 | 187 | ||||
Mixtral 8x7B Instruct | 8 | 46.7B 12.9B active at inference time | 32.8k | $0.5 | - | ||||
Gemma 3 270M | 8 | 0.268B | 32.0k | - | - | - | |||
Apertus 70B Instruct | 8 | 70B | 65.5k | $1.0 | - | ||||
Granite 4.0 Micro | 8 | 3B | 128k | - | - | - | |||
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 8 | 8B | 128k | - | - | - | |||
Llama 65B | 7 | 65B | 2.05k | - | - | - | |||
Qwen Chat 14B | 7 | 14B | 8.19k | - | - | - | |||
Mistral 7B Instruct | 7 | 7B | 8.19k | $0.2 | 104 | ||||
Command-R (Mar '24) | 7 | 35B | 128k | $0.6 | - | ||||
Granite 4.0 1B | 7 | 1.6B | 128k | - | - | - | |||
Molmo2-8B | 7 | 8.66B | 36.9k | - | - | - | |||
LFM2 8B A1B | 7 | 8.34B 1.5B active at inference time | 32.8k | - | - | ? | |||
Granite 3.3 8B (Non-reasoning) | 7 | 8.17B | 128k | $0.1 | 338 | ||||
Qwen3 1.7B (Non-reasoning) | 7 | 2.03B | 32.0k | $0.1 | - | ||||
Qwen3 0.6B (Reasoning) | 6 | 0.752B | 32.0k | $0.2 | - | ||||
Llama 3 Instruct 8B | 6 | 8B | 8.19k | $0.1 | - | ||||
Gemma 3n E4B Instruct | 6 | 8.39B 4B active at inference time | 32.0k | $0.0 | 53 | ||||
LFM2 1.2B | 6 | 1.17B | 32.8k | - | - | ? | |||
Gemma 3 4B Instruct | 6 | 4.3B | 128k | $0.0 | - | ||||
Llama 3.2 Instruct 1B | 6 | 1B | 128k | $0.1 | 84 | ||||
LFM2.5-VL-1.6B | 6 | 1.6B | 32.0k | - | - | ? | |||
Granite 4.0 350M | 6 | 0.35B | 32.8k | - | - | - | |||
Apertus 8B Instruct | 6 | 8B | 65.5k | $0.1 | - | ||||
Qwen3 0.6B (Non-reasoning) | 6 | 0.752B | 32.0k | $0.1 | - | ||||
Gemma 3 1B Instruct | 6 | 1B | 32.0k | - | - | ||||
Granite 4.0 H 350M | 5 | 0.34B | 32.8k | - | - | - | |||
Gemma 3n E2B Instruct | 5 | 5.98B 2B active at inference time | 32.0k | - | - | ||||
Tiny Aya Global | 5 | 3.35B | 8.19k | - | - | ||||
EXAONE 4.5 33B (Non-reasoning) | - | 34.4B | 262k | - | - | - | |||
Cogito v2.1 (Reasoning) | - | 671B 37B active at inference time | 128k | $1.3 | 67 |