Comparisons of Large Open Source AI Models (>150B)
Open source AI models with over 150B parameters. Models are considered Open Source (also commonly referred to as open weights) where their weights are accessible to download. This allows self-hosting on your own infrastructure and enables customizing the model such as through fine-tuning. Click on any model to see detailed metrics. For more details including relating to our methodology, see our FAQs.
Kimi K2.5 and GLM-4.7 are the highest intelligence Large open source models, defined as those with >150B parameters, followed by
DeepSeek V3.2 &
Kimi K2 Thinking.
Highlights
Openness
Artificial Analysis Openness Index: Results
Intelligence
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
{"@context":"https://schema.org","@type":"Dataset","name":"Artificial Analysis Intelligence Index","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}
Intelligence Evaluations
While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Model Size: Total and Active Parameters
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.
Intelligence vs. Active Parameters
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.
Intelligence vs. Total Parameters
Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
Context Window
Context Window
Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.
Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
{"@context":"https://schema.org","@type":"Dataset","name":"Context Window","creator":{"@type":"Organization","name":"Artificial Analysis","url":"https://artificialanalysis.ai"},"description":"Context Window: Tokens Limit; Higher is better","measurementTechnique":"Independent test run by Artificial Analysis on dedicated hardware.","spatialCoverage":"Worldwide","keywords":["analytics","llm","AI","benchmark","model","gpt","claude"],"license":"https://creativecommons.org/licenses/by/4.0/","isAccessibleForFree":true,"citation":"Artificial Analysis (2025). LLM benchmarks dataset. https://artificialanalysis.ai","data":""}
| Weights | Provider Benchmarks | |||||||
|---|---|---|---|---|---|---|---|---|
Kimi K2.5 (Reasoning) Kimi | 47 | 1.0KB (32B active at inference time) | 256k | $1.2 | 112 | 🤗 | +5 more | View |
GLM-4.7 (Reasoning) Z AI | 42 | 357B (32B active at inference time) | 200k | $0.9 | 127 | 🤗 | +8 more | View |
DeepSeek V3.2 (Reasoning) DeepSeek | 42 | 685B (37B active at inference time) | 128k | $0.3 | 30 | 🤗 | +5 more | View |
Kimi K2 Thinking Kimi | 41 | 1.0KB (32B active at inference time) | 256k | $1.1 | 89 | 🤗 | +10 more | View |
MiniMax-M2.1 MiniMax | 40 | 230B (10B active at inference time) | 205k | $0.5 | 56 | 🤗 | +4 more | View |
MiMo-V2-Flash (Reasoning) Xiaomi | 39 | 309B (15B active at inference time) | 256k | $0.1 | 178 | 🤗 | View | |
Kimi K2.5 (Non-reasoning) Kimi | 37 | 1.0KB (32B active at inference time) | 256k | $1.2 | 111 | 🤗 | +2 more | View |
GLM-4.7 (Non-reasoning) Z AI | 34 | 357B (32B active at inference time) | 200k | $0.9 | 159 | 🤗 | +7 more | View |
DeepSeek V3.2 Speciale DeepSeek | 34 | 685B (37B active at inference time) | 128k | $0.4 | - | 🤗 | View | |
K-EXAONE (Reasoning) LG AI Research | 32 | 236B (23B active at inference time) | 256k | - | 126 | 🤗 | View | |
DeepSeek V3.2 (Non-reasoning) DeepSeek | 32 | 685B (37B active at inference time) | 128k | $0.3 | 31 | 🤗 | +7 more | View |
Kimi K2 0905 Kimi | 31 | 1.0KB (32B active at inference time) | 256k | $1.2 | 55 | 🤗 | +4 more | View |
MiMo-V2-Flash (Non-reasoning) Xiaomi | 31 | 309B (15B active at inference time) | 256k | $0.1 | 162 | 🤗 | View | |
Qwen3 235B A22B 2507 (Reasoning) Alibaba | 29 | 235B (22B active at inference time) | 256k | $2.6 | 51 | 🤗 | +5 more | View |
Qwen3 VL 235B A22B (Reasoning) Alibaba | 28 | 235B (22B active at inference time) | 262k | $2.6 | 55 | 🤗 | View | |
DeepSeek R1 0528 (May '25) DeepSeek | 27 | 685B (37B active at inference time) | 128k | $2.4 | - | 🤗 | +7 more | View |
Qwen3 235B A22B 2507 Instruct Alibaba | 25 | 235B (22B active at inference time) | 256k | $1.2 | 64 | 🤗 | +9 more | View |
Qwen3 Coder 480B A35B Instruct Alibaba | 25 | 480B (35B active at inference time) | 262k | $3.0 | 62 | 🤗 | +7 more | View |
K-EXAONE (Non-reasoning) LG AI Research | 23 | 236B (23B active at inference time) | 256k | - | 96 | 🤗 | View | |
Mistral Large 3 Mistral | 23 | 675B (41B active at inference time) | 256k | $0.8 | 66 | 🤗 | View | |
Ring-1T InclusionAI | 23 | 1.0KB (50B active at inference time) | 128k | $1.0 | 52 | 🤗 | View | |
Hermes 4 - Llama-3.1 405B (Reasoning) Nous Research | 22 | 406B | 128k | $1.5 | 37 | 🤗 | View | |
Qwen3 VL 235B A22B Instruct Alibaba | 21 | 235B (22B active at inference time) | 262k | $1.2 | 52 | 🤗 | +4 more | View |
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) NVIDIA | 20 | 253B | 128k | $0.9 | 37 | 🤗 | View | |
Ling-1T InclusionAI | 19 | 1.0KB (50B active at inference time) | 128k | - | - | 🤗 | - | View |
Llama 4 Maverick Meta | 18 | 402B (17B active at inference time) | 1.00M | $0.5 | 128 | 🤗 | +9 more | View |
ERNIE 4.5 300B A47B Baidu | 17 | 300B (47B active at inference time) | 131k | $0.5 | 32 | 🤗 | View | |
Hermes 4 - Llama-3.1 405B (Non-reasoning) Nous Research | 17 | 406B | 128k | $1.5 | 33 | 🤗 | View | |
Llama 3.1 Instruct 405B Meta | 14 | 405B | 128k | $4.2 | 27 | 🤗 | +5 more | View |
R1 1776 Perplexity | 12 | 671B (37B active at inference time) | 128k | - | - | 🤗 | - | View |
Jamba 1.7 Large AI21 Labs | 9 | 398B (94B active at inference time) | 256k | $3.5 | 40 | 🤗 | View | |
Cogito v2.1 (Reasoning) Deep Cogito | - | 671B (37B active at inference time) | 128k | $1.3 | 77 | 🤗 | View |