Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis
All evaluations

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

Background

The Artificial Analysis Openness Index assesses how 'open' models are on the basis of their availability and transparency across different components (e.g. models weights, training data, and model architecture).
Availability represents the ability to use a model via API, self-hosting through open weights, and use freely with permissive licensing. Transparency captures the degree to which a model's methodology and data have been disclosed, shared, and permissively licensed for the community to use to understand a model's inputs and replicate or build on its approach.

Methodology

All evaluations are conducted independently by Artificial Analysis. More information can be found on our Intelligence Benchmarking Methodology page.

Highlights

  • Olmo 3.1 32B Instruct scores the highest on Openness Index with a score of 89, followed by Olmo 3.1 32B Think with a score of 89, and Olmo 3 7B Instruct with a score of 89
  • GPT-5 nano (high) scores the lowest on Openness Index with a score of 6, followed by o3 with a score of 6, and GPT-5 mini (high) with a score of 6

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

Artificial Analysis Openness Index: Components

Openness Index underlying score contribution by components, up to a maximum of 18 (higher is more open)
Model Availability
Transparency - Methodology
Transparency - Post-training Data
Transparency - Pre-training Data

Artificial Analysis Openness Index: Model Availability vs. Model Transparency

Model Availability reflects the availability of a model for usage and associated license (maximum 6 points); Model Transparency reflects methodology and data disclosures, data sharing, and code and licensing associated with a model's training process (maximum 12 points)
Most attractive quadrant
Alibaba
Allen Institute for AI
Anthropic
Google
Kimi
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
TII UAE
xAI
Xiaomi
Z AI

Artificial Analysis Openness Index: Score vs. Release Date

Artificial Analysis Openness Index; Release Date
Most attractive region
Alibaba
Allen Institute for AI
Anthropic
Google
Kimi
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
TII UAE
xAI
Xiaomi
Z AI

Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index

Artificial Analysis Openness Index; Artificial Analysis Intelligence Index
Most attractive quadrant
Alibaba
Allen Institute for AI
Anthropic
Google
Kimi
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
TII UAE
xAI
Xiaomi
Z AI

Openness Index Composition

Detailed methodology
1. Model availability
Weights
Access
0Closed weights, no API
1Closed weights, API limits token visibility
2Closed weights, API available
3Open weights
License
0Closed weights or no commercial use
1Commercial use, attribution required
2Commercial use, no attribution required
3Commercial use, no attribution required, no meaningful limitations
2. Model transparency
Data:Pre & Post Training(score represents average across each)
Access
0No or limited disclosure
1Partial data source detail and categorization disclosed
2Full data mix disclosure, substantial data shared¹
3Full data shared
License (most restrictive)
0No commercial use/no substantial data shared
1Commercial use, attribution required
2Commercial use, no attribution required
3Commercial use, no attribution required, no meaningful limitations
Methodology
Disclosure
0No or limited disclosure
1Model architecture disclosure
2Limited general technical disclosure
3Full technical details disclosed
License (most restrictive)
0No code disclosed/released
1Frameworks disclosed, openly available for commercial use
2End-to-end training pipeline code or guide released
3End-to-end training pipeline code or guide released, and commercial use allowed

Scoring methodology

Each component is scored on a 0-3 qualitative scale based on the best-fitting openness 'archetype', with each model assessed based on the full set of public first-party information available.

We synthesize these underlying factors into a unified metric, the Artificial Analysis Openness Index, as follows:

  • Data elements are averaged between pre- and post-training (to give a total of 6 possible points across data)
  • All component scores are added (up to a maximum of 18/18 points)
  • This score is normalized to a 0-100 scale

Where models are derived from a third-party base model, they may be constrained by the licensing or limited disclosure of the upstream model. For incremental/update releases, we only consider disclosures explicitly about the new release (including allowing model creators to declare which components remain consistent with an earlier release).

Openness Index Leaderboard

1
Allen Institute for AI logoAllen Institute for AI
Olmo 3.1 32B Instruct88.8912.016.0010.003.001.003.001.00
2
Allen Institute for AI logoAllen Institute for AI
Olmo 3.1 32B Think88.8914.246.0010.003.001.003.001.00
3
Allen Institute for AI logoAllen Institute for AI
Olmo 3 7B Instruct88.898.146.0010.003.001.003.001.00
4
Allen Institute for AI logoAllen Institute for AI
Olmo 3 7B Think88.8916.806.0010.003.001.003.001.00
5
Allen Institute for AI logoAllen Institute for AI
Molmo 7B-D88.899.256.0010.003.001.003.001.00
6
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
K2-V2 (low)88.8914.446.0010.003.001.003.001.00
7
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
K2-V2 (high)88.8920.676.0010.003.001.003.001.00
8
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
K2 Think V288.8924.526.0010.003.001.003.001.00
9
MBZUAI Institute of Foundation Models logoMBZUAI Institute of Foundation Models
K2-V2 (medium)88.8918.706.0010.003.001.003.001.00
10
Allen Institute for AI logoAllen Institute for AI
Olmo 3 32B Think88.8918.896.0010.003.001.003.001.00
11
Allen Institute for AI logoAllen Institute for AI
OLMo 2 7B88.899.306.0010.003.001.003.001.00
12
Allen Institute for AI logoAllen Institute for AI
OLMo 2 32B88.8910.576.0010.003.001.003.001.00
13
NVIDIA logoNVIDIA
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)72.2210.116.007.002.001.002.001.00
14
NVIDIA logoNVIDIA
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)72.2224.266.007.002.001.002.001.00
15
NVIDIA logoNVIDIA
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)72.2213.106.007.002.001.002.001.00
16
NVIDIA logoNVIDIA
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)72.2214.786.007.002.001.002.001.00
17
NVIDIA logoNVIDIA
NVIDIA Nemotron Nano 9B V2 (Reasoning)72.2214.766.007.002.001.002.001.00
18
Allen Institute for AI logoAllen Institute for AI
Molmo2-8B72.22-6.007.003.001.003.001.00
19
Kimi logoKimi
Kimi Linear 48B A3B Instruct61.1114.416.005.001.000.001.000.00
20
Baidu logoBaidu
ERNIE 4.5 300B A47B55.5617.266.004.000.000.000.000.00
21
Z AI logoZ AI
GLM-4.5-Air55.5623.166.004.000.000.000.000.00
22
Z AI logoZ AI
GLM-4.5 (Reasoning)55.5626.216.004.000.000.000.000.00
23
NVIDIA logoNVIDIA
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)52.7814.434.005.501.000.001.001.00
24
NVIDIA logoNVIDIA
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)52.7820.024.005.501.000.001.001.00
25
NVIDIA logoNVIDIA
Llama Nemotron Super 49B v1.5 (Non-reasoning)52.7814.514.005.501.000.001.001.00
26
NVIDIA logoNVIDIA
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)52.7814.354.005.501.000.001.001.00
27
NVIDIA logoNVIDIA
Llama 3.3 Nemotron Super 49B v1 (Reasoning)52.7818.494.005.501.000.001.001.00
28
NVIDIA logoNVIDIA
Llama Nemotron Super 49B v1.5 (Reasoning)52.7818.624.005.501.000.001.001.00
29
Xiaomi logoXiaomi
MiMo-V2-Flash (Reasoning)52.7839.246.003.500.000.001.000.00
30
Z AI logoZ AI
GLM-4.5V (Reasoning)52.7819.276.003.501.000.000.000.00
31
Z AI logoZ AI
GLM-4.5V (Non-reasoning)52.7812.536.003.501.000.000.000.00
32
Google logoGoogle
Gemma 3 12B Instruct50.008.796.003.000.000.000.000.00
33
Google logoGoogle
Gemma 3n E2B Instruct50.009.736.003.000.000.000.000.00
34
Google logoGoogle
Gemma 3 4B Instruct50.006.316.003.000.000.000.000.00
35
Google logoGoogle
Gemma 3 27B Instruct50.0010.196.003.000.000.000.000.00
36
Google logoGoogle
Gemma 3 1B Instruct50.008.656.003.000.000.000.000.00
37
Google logoGoogle
Gemma 3n E4B Instruct50.006.306.003.000.000.000.000.00
38
Mistral logoMistral
Magistral Small 1.250.0022.556.003.000.000.001.001.00
39
DeepSeek logoDeepSeek
DeepSeek R1 0528 (May '25)50.0027.016.003.000.000.000.000.00
40
Microsoft Azure logoMicrosoft Azure
Phi-450.0013.186.003.001.000.001.000.00
41
Microsoft Azure logoMicrosoft Azure
Phi-4 Multimodal Instruct50.0010.046.003.001.000.001.000.00
42
Microsoft Azure logoMicrosoft Azure
Phi-4 Mini Instruct50.0010.946.003.001.000.001.000.00
43
Alibaba logoAlibaba
Qwen3 VL 8B Instruct50.0014.256.003.001.000.001.000.00
44
Alibaba logoAlibaba
Qwen3 VL 32B (Reasoning)50.0024.526.003.001.000.001.000.00
45
Alibaba logoAlibaba
Qwen3 VL 32B Instruct50.0017.176.003.001.000.001.000.00
46
Alibaba logoAlibaba
Qwen3 VL 30B A3B (Reasoning)50.0019.626.003.001.000.001.000.00
47
Alibaba logoAlibaba
Qwen3 VL 235B A22B (Reasoning)50.0027.516.003.001.000.001.000.00
48
Alibaba logoAlibaba
Qwen3 VL 235B A22B Instruct50.0020.586.003.001.000.001.000.00
49
Alibaba logoAlibaba
Qwen3 VL 30B A3B Instruct50.0016.036.003.001.000.001.000.00
50
Alibaba logoAlibaba
Qwen3 VL 4B (Reasoning)50.0014.906.003.001.000.001.000.00
51
Alibaba logoAlibaba
Qwen3 VL 8B (Reasoning)50.0016.616.003.001.000.001.000.00
52
Alibaba logoAlibaba
Qwen3 VL 4B Instruct50.0014.086.003.001.000.001.000.00
53
DeepSeek logoDeepSeek
DeepSeek R1 0528 Qwen3 8B47.2216.436.002.500.000.001.000.00
54
Nous Research logoNous Research
Hermes 4 - Llama-3.1 405B (Reasoning)47.2221.724.004.501.000.002.000.00
55
Nous Research logoNous Research
Hermes 4 - Llama-3.1 405B (Non-reasoning)47.2217.124.004.501.000.002.000.00
56
Nous Research logoNous Research
Hermes 4 - Llama-3.1 70B (Reasoning)47.2220.394.004.501.000.002.000.00
57
Nous Research logoNous Research
Hermes 4 - Llama-3.1 70B (Non-reasoning)47.2213.554.004.501.000.002.000.00
58
ServiceNow logoServiceNow
Apriel-v1.5-15B-Thinker47.2228.336.002.500.000.001.000.00
59
Google logoGoogle
Gemma 3 270M44.448.376.002.000.000.000.000.00
60
TII UAE logoTII UAE
Falcon-H1R-7B44.4415.844.004.001.000.001.000.00
61
NVIDIA logoNVIDIA
Llama 3.1 Nemotron Instruct 70B44.4413.424.004.000.000.001.001.00
62
Z AI logoZ AI
GLM-4.7 (Reasoning)44.4442.056.002.000.000.000.000.00
63
Z AI logoZ AI
GLM-4.7 (Non-reasoning)44.4434.106.002.000.000.000.000.00
64
Z AI logoZ AI
GLM-4.7-Flash (Non-reasoning)44.4421.476.002.000.000.000.000.00
65
Z AI logoZ AI
GLM-4.7-Flash (Reasoning)44.4430.126.002.000.000.000.000.00
66
Alibaba logoAlibaba
Qwen3 4B 2507 Instruct44.4413.196.002.000.000.000.000.00
67
Alibaba logoAlibaba
Qwen3 235B A22B 2507 Instruct44.4424.666.002.000.000.000.000.00
68
Alibaba logoAlibaba
Qwen3 Coder 30B A3B Instruct44.4419.966.002.000.000.000.000.00
69
Alibaba logoAlibaba
Qwen3 Next 80B A3B (Reasoning)44.4426.496.002.000.000.000.000.00
70
Alibaba logoAlibaba
Qwen3 Coder 480B A35B Instruct44.4424.656.002.000.000.000.000.00
71
Alibaba logoAlibaba
Qwen3 Next 80B A3B Instruct44.4420.086.002.000.000.000.000.00
72
Alibaba logoAlibaba
Qwen3 30B A3B 2507 (Reasoning)44.4422.436.002.000.000.000.000.00
73
Alibaba logoAlibaba
Qwen3 30B A3B 2507 Instruct44.4415.006.002.000.000.000.000.00
74
Alibaba logoAlibaba
Qwen3 Omni 30B A3B (Reasoning)44.4415.606.002.000.000.000.000.00
75
Alibaba logoAlibaba
Qwen3 Omni 30B A3B Instruct44.4410.686.002.000.000.000.000.00
76
Alibaba logoAlibaba
Qwen3 235B A22B 2507 (Reasoning)44.4429.466.002.000.000.000.000.00
77
Alibaba logoAlibaba
Qwen3 4B 2507 (Reasoning)44.4418.606.002.000.000.000.000.00
78
InclusionAI logoInclusionAI
Ling-mini-2.044.4415.096.002.000.000.000.000.00
79
InclusionAI logoInclusionAI
Ling-flash-2.044.4415.476.002.000.000.000.000.00
80
InclusionAI logoInclusionAI
Ling-1T44.4419.016.002.000.000.000.000.00
81
Mistral logoMistral
Devstral Small (Jul '25)44.4415.206.002.000.000.000.000.00
82
DeepSeek logoDeepSeek
DeepSeek V3.2 Exp (Non-reasoning)44.4428.336.002.000.000.000.000.00
83
DeepSeek logoDeepSeek
DeepSeek V3.2 Exp (Reasoning)44.4432.906.002.000.000.000.000.00
84
Kimi logoKimi
Kimi K244.4426.194.004.001.000.001.000.00
85
Z AI logoZ AI
GLM-4.6 (Reasoning)44.4432.526.002.000.000.000.000.00
86
Z AI logoZ AI
GLM-4.6 (Non-reasoning)44.4430.156.002.000.000.000.000.00
87
ByteDance Seed logoByteDance Seed
Seed-OSS-36B-Instruct44.4424.996.002.000.000.000.000.00
88
IBM logoIBM
Granite 4.0 Micro41.677.666.001.501.000.000.000.00
89
IBM logoIBM
Granite 4.0 H 350M41.675.316.001.501.000.000.000.00
90
IBM logoIBM
Granite 4.0 H Small41.6710.796.001.501.000.000.000.00
91
IBM logoIBM
Granite 4.0 H 1B41.677.966.001.501.000.000.000.00
92
IBM logoIBM
Granite 4.0 350M41.676.626.001.501.000.000.000.00
93
IBM logoIBM
Granite 4.0 1B41.677.276.001.501.000.000.000.00
94
OpenAI logoOpenAI
gpt-oss-20B (high)38.8924.476.001.000.000.000.000.00
95
OpenAI logoOpenAI
gpt-oss-120B (high)38.8933.256.001.000.000.000.000.00
96
Meta logoMeta
Llama 3.3 Instruct 70B38.8914.234.003.001.000.001.000.00
97
Meta logoMeta
Llama 3.1 Instruct 405B38.8914.204.003.001.000.001.000.00
98
Meta logoMeta
Llama 3.2 Instruct 90B (Vision)38.8911.904.003.001.000.001.000.00
99
Meta logoMeta
Llama 3.2 Instruct 11B (Vision)38.8910.894.003.001.000.001.000.00
100
Mistral logoMistral
Mistral Small 3.238.8915.036.001.000.000.000.000.00
101
Mistral logoMistral
Mistral Large 338.8922.726.001.000.000.000.000.00
102
Perplexity logoPerplexity
R1 177638.8911.996.001.000.000.000.000.00
103
Reka AI logoReka AI
Reka Flash 338.8914.356.001.000.000.000.000.00
104
Nous Research logoNous Research
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)38.8910.896.001.000.000.000.000.00
105
Deep Cogito logoDeep Cogito
Cogito v2.1 (Reasoning)38.89-6.001.000.000.000.000.00
106
AI21 Labs logoAI21 Labs
Jamba Reasoning 3B38.8910.336.001.000.000.000.000.00
107
InclusionAI logoInclusionAI
Ring-flash-2.038.8920.586.001.000.000.000.000.00
108
InclusionAI logoInclusionAI
Ring-1T38.8922.546.001.000.000.000.000.00
109
DeepSeek logoDeepSeek
DeepSeek V3.1 Terminus (Reasoning)38.8933.796.001.000.000.000.000.00
110
DeepSeek logoDeepSeek
DeepSeek V3.1 Terminus (Non-reasoning)38.8928.376.001.000.000.000.000.00
111
DeepSeek logoDeepSeek
DeepSeek R1 Distill Llama 70B36.1115.954.002.500.000.001.000.00
112
Liquid AI logoLiquid AI
LFM2 2.6B33.337.864.002.000.000.000.000.00
113
Liquid AI logoLiquid AI
LFM2 8B A1B33.336.854.002.000.000.000.000.00
114
Nous Research logoNous Research
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)33.337.585.001.000.000.000.000.00
115
Cohere logoCohere
Command A33.3313.443.003.000.000.000.000.00
116
Liquid AI logoLiquid AI
LFM2 1.2B33.336.364.002.000.000.000.000.00
117
Naver logoNaver
HyperCLOVA X SEED Think (32B)30.5623.724.001.501.000.000.000.00
118
Meta logoMeta
Llama 4 Scout27.7813.484.001.000.000.000.000.00
119
Meta logoMeta
Llama 4 Maverick27.7818.304.001.000.000.000.000.00
120
Mistral logoMistral
Magistral Medium 1.227.7827.042.003.000.000.001.001.00
121
Liquid AI logoLiquid AI
LFM2.5-1.2B-Thinking27.788.124.001.000.000.000.000.00
122
Liquid AI logoLiquid AI
LFM2.5-1.2B-Instruct27.787.954.001.000.000.000.000.00
123
Liquid AI logoLiquid AI
LFM2.5-VL-1.6B27.786.064.001.000.000.000.000.00
124
MiniMax logoMiniMax
MiniMax-M2.127.7839.554.001.000.000.000.000.00
125
Kimi logoKimi
Kimi K2 090527.7830.814.001.000.000.000.000.00
126
Kimi logoKimi
Kimi K2 Thinking27.7840.704.001.000.000.000.000.00
127
LG AI Research logoLG AI Research
K-EXAONE (Reasoning)27.7832.134.001.000.000.000.000.00
128
LG AI Research logoLG AI Research
Exaone 4.0 1.2B (Non-reasoning)27.788.073.002.000.000.000.000.00
129
LG AI Research logoLG AI Research
Exaone 4.0 1.2B (Reasoning)27.788.263.002.000.000.000.000.00
130
LG AI Research logoLG AI Research
EXAONE 4.0 32B (Non-reasoning)27.7811.543.002.000.000.000.000.00
131
LG AI Research logoLG AI Research
EXAONE 4.0 32B (Reasoning)27.7816.653.002.000.000.000.000.00
132
MiniMax logoMiniMax
MiniMax-M227.7835.984.001.000.000.000.000.00
133
AI21 Labs logoAI21 Labs
Jamba 1.7 Mini22.227.334.000.000.000.000.000.00
134
AI21 Labs logoAI21 Labs
Jamba 1.7 Large22.229.274.000.000.000.000.000.00
135
Alibaba logoAlibaba
Qwen3 Max16.6731.332.001.000.000.000.000.00
136
Alibaba logoAlibaba
Qwen3 Max Thinking (Preview)16.6732.452.001.000.000.000.000.00
137
Google logoGoogle
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)11.1119.402.000.000.000.000.000.00
138
Anthropic logoAnthropic
Claude 4.5 Sonnet (Non-reasoning)11.1137.062.000.000.000.000.000.00
139
Anthropic logoAnthropic
Claude 4.5 Sonnet (Reasoning)11.1142.922.000.000.000.000.000.00
140
Anthropic logoAnthropic
Claude 4.5 Haiku (Non-reasoning)11.1131.032.000.000.000.000.000.00
141
Anthropic logoAnthropic
Claude Opus 4.5 (Non-reasoning)11.1143.052.000.000.000.000.000.00
142
Anthropic logoAnthropic
Claude Opus 4.5 (Reasoning)11.1149.692.000.000.000.000.000.00
143
Anthropic logoAnthropic
Claude 4.5 Haiku (Reasoning)11.1137.022.000.000.000.000.000.00
144
Mistral logoMistral
Mistral Medium 3.111.1121.132.000.000.000.000.000.00
145
xAI logoxAI
Grok 4.1 Fast (Non-reasoning)11.1123.542.000.000.000.000.000.00
146
xAI logoxAI
Grok 3 mini Reasoning (high)11.1132.022.000.000.000.000.000.00
147
Amazon logoAmazon
Nova Micro11.1110.252.000.000.000.000.000.00
148
Amazon logoAmazon
Nova Premier11.1118.872.000.000.000.000.000.00
149
Upstage logoUpstage
Solar Pro 2 (Reasoning)11.1114.932.000.000.000.000.000.00
150
Upstage logoUpstage
Solar Pro 2 (Non-reasoning)11.1113.532.000.000.000.000.000.00
151
ByteDance Seed logoByteDance Seed
Doubao Seed Code11.1133.502.000.000.000.000.000.00
152
OpenAI logoOpenAI
GPT-5 (ChatGPT)11.1121.832.000.000.000.000.000.00
153
OpenAI logoOpenAI
GPT-5.1 (Non-reasoning)11.1127.412.000.000.000.000.000.00
154
Google logoGoogle
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)11.1125.512.000.000.000.000.000.00
155
Mistral logoMistral
Devstral Medium11.1118.622.000.000.000.000.000.00
156
xAI logoxAI
Grok 4 Fast (Non-reasoning)11.1122.642.000.000.000.000.000.00
157
Amazon logoAmazon
Nova Pro11.1113.462.000.000.000.000.000.00
158
Amazon logoAmazon
Nova Lite11.1112.452.000.000.000.000.000.00
159
OpenAI logoOpenAI
GPT-5 nano (high)5.5626.691.000.000.000.000.000.00
160
OpenAI logoOpenAI
o35.5640.911.000.000.000.000.000.00
161
OpenAI logoOpenAI
GPT-5 mini (high)5.5641.031.000.000.000.000.000.00
162
Google logoGoogle
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)5.5621.601.000.000.000.000.000.00
163
Google logoGoogle
Gemini 3 Pro Preview (high)5.5648.441.000.000.000.000.000.00
164
Google logoGoogle
Gemini 2.5 Pro5.5634.451.000.000.000.000.000.00
165
xAI logoxAI
Grok 45.5641.431.000.000.000.000.000.00
166
xAI logoxAI
Grok Code Fast 15.5628.671.000.000.000.000.000.00
167
xAI logoxAI
Grok 4.1 Fast (Reasoning)5.5638.541.000.000.000.000.000.00
168
OpenAI logoOpenAI
GPT-5 Codex (high)5.5644.521.000.000.000.000.000.00
169
OpenAI logoOpenAI
GPT-5 (minimal)5.5623.741.000.000.000.000.000.00
170
OpenAI logoOpenAI
GPT-5 mini (minimal)5.5620.661.000.000.000.000.000.00
171
OpenAI logoOpenAI
GPT-5 nano (medium)5.5625.681.000.000.000.000.000.00
172
OpenAI logoOpenAI
GPT-5 nano (minimal)5.5613.651.000.000.000.000.000.00
173
OpenAI logoOpenAI
GPT-5.1 (high)5.5647.561.000.000.000.000.000.00
174
OpenAI logoOpenAI
GPT-5 (low)5.5639.031.000.000.000.000.000.00
175
OpenAI logoOpenAI
GPT-5 (high)5.5644.571.000.000.000.000.000.00
176
OpenAI logoOpenAI
GPT-5 mini (medium)5.5638.811.000.000.000.000.000.00
177
OpenAI logoOpenAI
GPT-5 (medium)5.5641.841.000.000.000.000.000.00
178
Google logoGoogle
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)5.5631.091.000.000.000.000.000.00
179
xAI logoxAI
Grok 4 Fast (Reasoning)5.5634.931.000.000.000.000.000.00

Explore Evaluations

Artificial Analysis Intelligence IndexArtificial Analysis Intelligence Index

A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.

GDPval-AA LeaderboardGDPval-AA Leaderboard

GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop to solve tasks, with ELO ratings derived from blind pairwise comparisons.

AA-Omniscience: Knowledge and Hallucination BenchmarkAA-Omniscience: Knowledge and Hallucination Benchmark

A benchmark measuring factual recall and hallucination across various economically relevant domains.

Artificial Analysis Openness IndexArtificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

MMLU-Pro Benchmark LeaderboardMMLU-Pro Benchmark Leaderboard

An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.

Global-MMLU-Lite Benchmark LeaderboardGlobal-MMLU-Lite Benchmark Leaderboard

A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.

GPQA Diamond Benchmark Leaderboard

The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.

Humanity's Last Exam Benchmark LeaderboardHumanity's Last Exam Benchmark Leaderboard

A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.

LiveCodeBench Benchmark LeaderboardLiveCodeBench Benchmark Leaderboard

A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.

SciCode Benchmark LeaderboardSciCode Benchmark Leaderboard

A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.

MATH-500 Benchmark LeaderboardMATH-500 Benchmark Leaderboard

A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.

IFBench Benchmark LeaderboardIFBench Benchmark Leaderboard

A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.

AIME 2025 Benchmark LeaderboardAIME 2025 Benchmark Leaderboard

All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.

CritPt Benchmark LeaderboardCritPt Benchmark Leaderboard

A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.

Terminal-Bench Hard Benchmark LeaderboardTerminal-Bench Hard Benchmark Leaderboard

An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.

𝜏²-Bench Telecom Benchmark Leaderboard𝜏²-Bench Telecom Benchmark Leaderboard

A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.

Artificial Analysis Long Context Reasoning Benchmark LeaderboardArtificial Analysis Long Context Reasoning Benchmark Leaderboard

A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).

MMMU-Pro Benchmark LeaderboardMMMU-Pro Benchmark Leaderboard

An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.