Comparisons of Medium Open Source AI Models (40B-150B)

Open source AI models with between 40B to 150B parameters. Models are considered open source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Alibaba logoQwen3.5 122B A10B and Mistral logoMistral Medium 3.5 are the highest intelligence Medium open source models, defined as those with 40B-150B parameters, followed by Alibaba logoQwen3.5 122B A10B & NVIDIA logoNVIDIA Nemotron 3 Super.

Highlights

Artificial Analysis Openness Index · Higher is better
Updated
Artificial Analysis Intelligence Index · Higher is better
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Score

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
Reasoning models are indicated by a lightbulb icon

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
Estimate (independent evaluation forthcoming)
Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better

Agentic real-world work tasks, (Elo-500)/2000

Agentic tool use

Agentic coding & terminal use

Coding

Reasoning & knowledge

Scientific reasoning

Physics reasoning

Long context reasoning

Agentic knowledge work, (Elo-500)/2000

Instruction following

Long-horizon agentic tasks

Kubernetes incident root-cause analysis

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
Reasoning models are indicated by a lightbulb icon

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active parameters at inference time · Artificial Analysis Intelligence Index
Most attractive quadrant
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index · Size in parameters (billions)
Most attractive quadrant
Alibaba
InclusionAI
LongCat
MBZUAI Institute of Foundation Models
Meta
Mistral
Multiverse Computing
NVIDIA
OpenAI
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context window: tokens limit · Higher is better
Reasoning models are indicated by a lightbulb icon

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details

Weights
Provider Benchmarks
Qwen3.5 122B A10B (Reasoning)
Alibaba logoAlibaba
32
125B
10B active at inference time
262k
$0.7
141
SiliconFlowAlibaba CloudNovita
+2
Mistral Medium 3.5
Mistral logoMistral
30
128B
256k
$1.2
133
Mistral
Qwen3.5 122B A10B (Non-reasoning)
Alibaba logoAlibaba
28
125B
10B active at inference time
262k
$0.7
148
Alibaba CloudDeepInfra
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA logoNVIDIA
25
120.6B
12.7B active at inference time
1.00M
$0.3
243
BasetenDeepInfraLightning AI
+2
gpt-oss-120b (high)
OpenAI logoOpenAI
24
117B
5.1B active at inference time
131k
$0.2
305
NovitaNebiusCerebras
+23
HyperNova 60B 2605
Multiverse Computing logoMultiverse Computing
22
58.7B
4.8B active at inference time
131k
$0.1
395
CompactifAI
Qwen3 Coder Next
Alibaba logoAlibaba
21
79.7B
3B active at inference time
256k
$0.4
127
ParasailAmazon BedrockTogether AINovita
Mistral Small 4 (Reasoning)
Mistral logoMistral
21
119B
6.5B active at inference time
256k
$0.2
168
Mistral
Qwen3 Next 80B A3B (Reasoning)
Alibaba logoAlibaba
20
80B
3B active at inference time
262k
$1.1
178
Eigen AIGoogleGMI
+5
Ling 2.6 Flash
InclusionAI logoInclusionAI
19
107B
7.4B active at inference time
262k
$0.1
195
Novita
Devstral 2
Mistral logoMistral
19
125B
256k
-
30
Mistral
gpt-oss-120b (low)
OpenAI logoOpenAI
18
117B
5.1B active at inference time
131k
$0.2
329
SambaNovaDatabricksCoreWeave
+19
K2 Think V2
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
17
70B
262k
-
-
-
LongCat Flash Lite
LongCat logoLongCat
17
68.5B
3B active at inference time
256k
-
-
LongCat
INTELLECT-3
Prime Intellect logoPrime Intellect
16
107B
12B active at inference time
131k
-
-
-
Solar Open 100B (Reasoning)
Upstage logoUpstage
15
102B
12B active at inference time
128k
-
-
-
K2-V2 (high)
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
14
70B
512k
-
-
-
Qwen3 Next 80B A3B Instruct
Alibaba logoAlibaba
14
80B
3B active at inference time
262k
$0.7
178
GMIGoogleAlibaba Cloud
+4
K2-V2 (medium)
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
12
70B
512k
-
-
-
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA logoNVIDIA
12
49B
128k
$0.1
50
DeepInfra
Mistral Small 4 (Non-reasoning)
Mistral logoMistral
12
119B
6.5B active at inference time
256k
$0.2
153
Mistral
Sarvam 105B (high)
Sarvam logoSarvam
12
106B
10.3B active at inference time
128k
$0.0
118
Sarvam
Llama 4 Scout
Meta logoMeta
10
109B
17B active at inference time
10.0M
$0.2
104
CompactifAICloudflareNovita
+6
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research logoNous Research
10
70.6B
128k
$0.2
86
Nebius
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA logoNVIDIA
9
49B
128k
$0.1
50
DeepInfra
Llama 3.3 Instruct 70B
Meta logoMeta
9
70B
128k
$0.6
89
Microsoft AzureCoreWeaveParasail
+18
K2-V2 (low)
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
9
70B
512k
-
-
-
Kimi Linear 48B A3B Instruct
Kimi logoKimi
9
49.1B
3B active at inference time
1.00M
-
-
-
Ring-flash-2.0
InclusionAI logoInclusionAI
8
103B
6.1B active at inference time
128k
$0.2
-
SiliconFlow
Command A
Cohere logoCohere
8
111B
256k
$3.3
67
CohereMicrosoft Azure
Llama 3.1 Nemotron Instruct 70B
NVIDIA logoNVIDIA
8
70B
128k
$1.2
299
DeepInfra
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research logoNous Research
7
70.6B
128k
$0.2
84
Nebius
Llama 3.2 Instruct 90B (Vision)
Meta logoMeta
6
90B
128k
$1.4
58
Microsoft AzureAmazon Bedrock
Jamba 1.7 Mini
AI21 Labs logoAI21 Labs
3
52B
12B active at inference time
258k
-
-
-
Apertus 70B Instruct
Swiss AI Initiative logoSwiss AI Initiative
2
70B
65.5k
$1.0
-
Public AI