Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Comparisons of Medium Open Source AI Models (40B-150B)

Open source AI models with between 40B to 150B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.

Alibaba logoQwen3.5 122B A10B and NVIDIA logoNVIDIA Nemotron 3 Super are the highest intelligence Medium open source models, defined as those with 40B-150B parameters, followed by Alibaba logoQwen3.5 122B A10B & OpenAI logogpt-oss-120B (high).

Intelligence
Artificial Analysis Intelligence Index; Higher is better
Total Parameters
Trainable parameters in billions

Openness

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
+ Add model from specific provider

Intelligence

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt
+ Add model from specific provider
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis; Higher is better
+ Add model from specific provider
Results claimed by AI Lab (not yet independently verified)
GDPval-AA (Agentic Real-World Work Tasks, (ELO-500)/2000)
Terminal-Bench Hard (Agentic Coding & Terminal Use)
𝜏²-Bench Telecom (Agentic Tool Use)
AA-LCR (Long Context Reasoning)
AA-Omniscience Accuracy (Knowledge)
AA-Omniscience Non-Hallucination Rate (1 - Hallucination Rate)
Humanity's Last Exam (Reasoning & Knowledge)
GPQA Diamond (Scientific Reasoning)
SciCode (Coding)
IFBench (Instruction Following)
CritPt (Physics Reasoning)
MMMU Pro (Visual Reasoning)
Reasoning models are indicated by a lightbulb icon.

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Size

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference
+ Add model from specific provider
Active Parameters
Passive Parameters
Reasoning models are indicated by a lightbulb icon.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index
+ Add model from specific provider
Most attractive quadrant
Alibaba
LongCat
MBZUAI Institute of Foundation Models
Meta
Mistral
NVIDIA
OpenAI
Prime Intellect
Z AI
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)
+ Add model from specific provider
Most attractive quadrant
Alibaba
LongCat
MBZUAI Institute of Foundation Models
Meta
Mistral
NVIDIA
OpenAI
Prime Intellect
Z AI
Reasoning models are indicated by a lightbulb icon.

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Window

Context Window

Context Window: Tokens Limit; Higher is better
+ Add model from specific provider
Reasoning models are indicated by a lightbulb icon.

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Further details
WeightsProvider
Benchmarks
Alibaba logo
Qwen3.5 122B A10B (Reasoning)
Alibaba
42
125B
(10B active at inference time)
262k
$1.1
132
🤗
Novita
GMI
Alibaba Cloud
View
NVIDIA logo
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA
36
120.6B
(12.7B active at inference time)
1.00M
$0.4
458
Not available
Lightning AI
Weights & Biases
DeepInfra
+2 more
View
Alibaba logo
Qwen3.5 122B A10B (Non-reasoning)
Alibaba
36
125B
(10B active at inference time)
262k
$1.1
126
🤗
Alibaba Cloud
View
OpenAI logo
gpt-oss-120B (high)
OpenAI
33
117B
(5.1B active at inference time)
131k
$0.3
280
🤗
Snowflake
Clarifai
Parasail
+22 more
View
Alibaba logo
Qwen3 Coder Next
Alibaba
28
79.7B
(3B active at inference time)
256k
$0.6
137
🤗
Parasail
Together.ai
Novita
+1 more
View
Mistral logo
Mistral Small 4 (Reasoning)
Mistral
27
119B
(6.5B active at inference time)
256k
$0.3
153
🤗
Mistral
View
Alibaba logo
Qwen3 Next 80B A3B (Reasoning)
Alibaba
27
80B
(3B active at inference time)
262k
$1.9
155
🤗
Eigen AI
Hyperbolic
Nebius
+4 more
View
OpenAI logo
gpt-oss-120B (low)
OpenAI
24
117B
(5.1B active at inference time)
131k
$0.3
288
🤗
Amazon Bedrock
DeepInfra
Novita
+18 more
View
MBZUAI Institute of Foundation Models logo
K2 Think V2
MBZUAI Institute of Foundation Models
24
70B
262k
-
-
Not available
-
View
LongCat logo
LongCat Flash Lite
LongCat
24
68.5B
(3B active at inference time)
256k
-
101
🤗
LongCat
View
Z AI logo
GLM-4.6V (Reasoning)
Z AI
23
108B
128k
$0.5
31
🤗
SiliconFlow
DeepInfra
Novita
View
Prime Intellect logo
INTELLECT-3
Prime Intellect
22
107B
131k
-
-
🤗
-
View
Mistral logo
Devstral 2
Mistral
22
125B
256k
-
82
🤗
Mistral
View
MBZUAI Institute of Foundation Models logo
K2-V2 (high)
MBZUAI Institute of Foundation Models
21
70B
512k
-
-
🤗
-
View
Alibaba logo
Qwen3 Next 80B A3B Instruct
Alibaba
20
80B
(3B active at inference time)
262k
$0.9
149
🤗
GMI
Google
Parasail
+4 more
View
NVIDIA logo
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA
19
49B
128k
$0.2
81
🤗
DeepInfra
View
MBZUAI Institute of Foundation Models logo
K2-V2 (medium)
MBZUAI Institute of Foundation Models
19
70B
512k
-
-
🤗
-
View
Mistral logo
Mistral Small 4 (Non-reasoning)
Mistral
19
119B
(6.5B active at inference time)
256k
$0.3
130
🤗
Mistral
View
NVIDIA logo
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA
18
49B
128k
-
-
🤗
-
View
Sarvam logo
Sarvam 105B (Reasoning)
Sarvam
18
106B
(10.3B active at inference time)
65.5k
-
78
🤗
Sarvam
View
Z AI logo
GLM-4.6V (Non-reasoning)
Z AI
17
108B
128k
$0.5
21
🤗
Novita
SiliconFlow
View
Nous Research logo
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research
16
70.6B
128k
$0.2
77
🤗
Nebius
View
DeepSeek logo
DeepSeek R1 Distill Llama 70B
DeepSeek
16
70B
128k
$0.9
53
🤗
SambaNova
Scaleway
DeepInfra
View
InclusionAI logo
Ling-flash-2.0
InclusionAI
16
103B
(6.1B active at inference time)
128k
$0.2
58
🤗
SiliconFlow
View
NVIDIA logo
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA
15
49B
128k
$0.2
81
🤗
DeepInfra
View
Meta logo
Llama 3.3 Instruct 70B
Meta
14
70B
128k
$0.7
81
🤗
Lightning AI
FriendliAI
Snowflake
+19 more
View
MBZUAI Institute of Foundation Models logo
K2-V2 (low)
MBZUAI Institute of Foundation Models
14
70B
512k
-
-
🤗
-
View
Kimi logo
Kimi Linear 48B A3B Instruct
Kimi
14
49.1B
(3B active at inference time)
1.00M
-
-
🤗
-
View
NVIDIA logo
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA
14
49B
128k
-
-
🤗
-
View
InclusionAI logo
Ring-flash-2.0
InclusionAI
14
103B
(6.1B active at inference time)
128k
$0.2
78
🤗
SiliconFlow
View
Meta logo
Llama 4 Scout
Meta
14
109B
(17B active at inference time)
10.0M
$0.3
127
🤗
Eigen AI
Cloudflare
Google
+7 more
View
Cohere logo
Command A
Cohere
13
111B
256k
$4.4
46
🤗
Microsoft Azure
Cohere
View
NVIDIA logo
Llama 3.1 Nemotron Instruct 70B
NVIDIA
13
70B
128k
$1.2
36
🤗
DeepInfra
View
Nous Research logo
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research
13
70.6B
128k
$0.2
79
🤗
Nebius
View
Meta logo
Llama 3.2 Instruct 90B (Vision)
Meta
12
90B
128k
$0.7
56
🤗
Google
DeepInfra
Amazon Bedrock
+1 more
View
AI21 Labs logo
Jamba 1.7 Mini
AI21 Labs
8
52B
(12B active at inference time)
258k
-
-
🤗
-
View