Comparisons of Medium Open Source AI Models (40B-150B)
Open source AI models with between 40B to 150B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Mistral Medium 3.5 are the highest intelligence Medium open source models, defined as those with 40B-150B parameters, followed by Highlights
Openness
Artificial Analysis Openness Index: Score
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Estimate (independent evaluation forthcoming)
Reasoning models are indicated by a lightbulb icon
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
GDPval-AA v2Updated
Agentic real-world work tasks, (Elo-500)/2000
𝜏³-BankingNew
Agentic tool use
Agentic coding & terminal use
Coding
Reasoning & knowledge
Scientific reasoning
Physics reasoning
Knowledge
1 - hallucination rate
Long context reasoning
AA-BriefcaseNew
Agentic knowledge work, (Elo-500)/2000
Instruction following
Long-horizon agentic tasks
Kubernetes incident root-cause analysis
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Size
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon
Intelligence vs. Active Parameters
Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Reasoning models are indicated by a lightbulb icon.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
InclusionAI
LongCat
MBZUAI Institute of Foundation Models
Meta
Mistral
Multiverse Computing
NVIDIA
OpenAI
Reasoning models are indicated by a lightbulb icon.
Context Window
Context Window
Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon
Further details
Weights | Provider Benchmarks | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Qwen3.5 122B A10B (Reasoning) | 32 | 125B 10B active at inference time | 262k | $0.7 | 141 | +2 | |||
Mistral Medium 3.5 | 30 | 128B | 256k | $1.2 | 133 | ||||
Qwen3.5 122B A10B (Non-reasoning) | 28 | 125B 10B active at inference time | 262k | $0.7 | 148 | ||||
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 25 | 120.6B 12.7B active at inference time | 1.00M | $0.3 | 243 | +2 | |||
gpt-oss-120b (high) | 24 | 117B 5.1B active at inference time | 131k | $0.2 | 305 | +23 | |||
HyperNova 60B 2605 | 22 | 58.7B 4.8B active at inference time | 131k | $0.1 | 395 | ||||
Qwen3 Coder Next | 21 | 79.7B 3B active at inference time | 256k | $0.4 | 127 | ||||
Mistral Small 4 (Reasoning) | 21 | 119B 6.5B active at inference time | 256k | $0.2 | 168 | ||||
Qwen3 Next 80B A3B (Reasoning) | 20 | 80B 3B active at inference time | 262k | $1.1 | 178 | +5 | |||
Ling 2.6 Flash | 19 | 107B 7.4B active at inference time | 262k | $0.1 | 195 | ||||
Devstral 2 | 19 | 125B | 256k | - | 30 | ||||
gpt-oss-120b (low) | 18 | 117B 5.1B active at inference time | 131k | $0.2 | 329 | +19 | |||
K2 Think V2 | 17 | 70B | 262k | - | - | - | |||
LongCat Flash Lite | 17 | 68.5B 3B active at inference time | 256k | - | - | ||||
INTELLECT-3 | 16 | 107B 12B active at inference time | 131k | - | - | - | |||
Solar Open 100B (Reasoning) | 15 | 102B 12B active at inference time | 128k | - | - | - | |||
K2-V2 (high) | 14 | 70B | 512k | - | - | - | |||
Qwen3 Next 80B A3B Instruct | 14 | 80B 3B active at inference time | 262k | $0.7 | 178 | +4 | |||
K2-V2 (medium) | 12 | 70B | 512k | - | - | - | |||
Llama Nemotron Super 49B v1.5 (Reasoning) | 12 | 49B | 128k | $0.1 | 50 | ||||
Mistral Small 4 (Non-reasoning) | 12 | 119B 6.5B active at inference time | 256k | $0.2 | 153 | ||||
Sarvam 105B (high) | 12 | 106B 10.3B active at inference time | 128k | $0.0 | 118 | ||||
Llama 4 Scout | 10 | 109B 17B active at inference time | 10.0M | $0.2 | 104 | +6 | |||
Hermes 4 - Llama-3.1 70B (Reasoning) | 10 | 70.6B | 128k | $0.2 | 86 | ||||
Llama Nemotron Super 49B v1.5 (Non-reasoning) | 9 | 49B | 128k | $0.1 | 50 | ||||
Llama 3.3 Instruct 70B | 9 | 70B | 128k | $0.6 | 89 | +18 | |||
K2-V2 (low) | 9 | 70B | 512k | - | - | - | |||
Kimi Linear 48B A3B Instruct | 9 | 49.1B 3B active at inference time | 1.00M | - | - | - | |||
Ring-flash-2.0 | 8 | 103B 6.1B active at inference time | 128k | $0.2 | - | ||||
Command A | 8 | 111B | 256k | $3.3 | 67 | ||||
Llama 3.1 Nemotron Instruct 70B | 8 | 70B | 128k | $1.2 | 299 | ||||
Hermes 4 - Llama-3.1 70B (Non-reasoning) | 7 | 70.6B | 128k | $0.2 | 84 | ||||
Llama 3.2 Instruct 90B (Vision) | 6 | 90B | 128k | $1.4 | 58 | ||||
Jamba 1.7 Mini | 3 | 52B 12B active at inference time | 258k | - | - | - | |||
Apertus 70B Instruct | 2 | 70B | 65.5k | $1.0 | - |