Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

Background

The Artificial Analysis Openness Index assesses how 'open' models are on the basis of their availability and transparency across different components (e.g. models weights, training data, and model architecture).

Availability represents the ability to use a model via API, self-hosting through open weights, and use freely with permissive licensing. Transparency captures the degree to which a model's methodology and data have been disclosed, shared, and permissively licensed for the community to use to understand a model's inputs and replicate or build on its approach.

Methodology

All evaluations are conducted independently by Artificial Analysis. More information can be found on our Intelligence Benchmarking Methodology page.

Highlights

Olmo 3 7B Think scores the highest on Openness Index with a score of 89, followed by Olmo 3 7B Instruct with a score of 89, and Olmo 3.1 32B Think with a score of 89
o3 scores the lowest on Openness Index with a score of 6, followed by GPT-5 nano (high) with a score of 6, and GPT-5 mini (high) with a score of 6

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

+ Add model from specific provider

Artificial Analysis Openness Index: Components

Openness Index underlying score contribution by components, up to a maximum of 18 (higher is more open)

+ Add model from specific provider

Model Availability

Transparency - Methodology

Transparency - Post-training Data

Transparency - Pre-training Data

Artificial Analysis Openness Index: Model Availability vs. Model Transparency

Model Availability reflects the availability of a model for usage and associated license (maximum 6 points); Model Transparency reflects methodology and data disclosures, data sharing, and code and licensing associated with a model's training process (maximum 12 points)

+ Add model from specific provider

Most attractive quadrant

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Artificial Analysis Openness Index: Score vs. Release Date

Artificial Analysis Openness Index; Release Date

+ Add model from specific provider

Most attractive region

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index

Artificial Analysis Openness Index; Artificial Analysis Intelligence Index

+ Add model from specific provider

Most attractive quadrant

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Openness Index Composition

Detailed methodology

Dimension

Component

Scoring

1. Model availability

Weights

Access

0Closed weights, no API

1Closed weights, API limits token visibility

2Closed weights, API available

3Open weights

License

0Closed weights or no commercial use

1Commercial use, attribution required

2Commercial use, no attribution required

3Commercial use, no attribution required, no meaningful limitations

2. Model transparency

Data:Pre & Post Training(score represents average across each)

Access

0No or limited disclosure

1Partial data source detail and categorization disclosed

2Full data mix disclosure, substantial data shared¹

3Full data shared

License (most restrictive)

0No commercial use/no substantial data shared

1Commercial use, attribution required

2Commercial use, no attribution required

3Commercial use, no attribution required, no meaningful limitations

Methodology

Disclosure

0No or limited disclosure

1Model architecture disclosure

2Limited general technical disclosure

3Full technical details disclosed

License (most restrictive)

0No code disclosed/released

1Frameworks disclosed, openly available for commercial use

2End-to-end training pipeline code or guide released

3End-to-end training pipeline code or guide released, and commercial use allowed

Scoring methodology

Each component is scored on a 0-3 qualitative scale based on the best-fitting openness 'archetype', with each model assessed based on the full set of public first-party information available.

We synthesize these underlying factors into a unified metric, the Artificial Analysis Openness Index, as follows:

Data elements are averaged between pre- and post-training (to give a total of 6 possible points across data)
All component scores are added (up to a maximum of 18/18 points)
This score is normalized to a 0-100 scale

Where models are derived from a third-party base model, they may be constrained by the licensing or limited disclosure of the upstream model. For incremental/update releases, we only consider disclosures explicitly about the new release (including allowing model creators to declare which components remain consistent with an earlier release).

Openness Index Leaderboard


1	Allen Institute for AI	Olmo 3 7B Think	88.89	9.43	6.00	10.00	3.00	1.00	3.00	1.00
2	Allen Institute for AI	Olmo 3 7B Instruct	88.89	8.15	6.00	10.00	3.00	1.00	3.00	1.00
3	Allen Institute for AI	Olmo 3.1 32B Think	88.89	13.94	6.00	10.00	3.00	1.00	3.00	1.00
4	Allen Institute for AI	Olmo 3.1 32B Instruct	88.89	12.16	6.00	10.00	3.00	1.00	3.00	1.00
5	MBZUAI Institute of Foundation Models	K2-V2 (medium)	88.89	18.68	6.00	10.00	3.00	1.00	3.00	1.00
6	MBZUAI Institute of Foundation Models	K2 Think V2	88.89	24.12	6.00	10.00	3.00	1.00	3.00	1.00
7	MBZUAI Institute of Foundation Models	K2-V2 (high)	88.89	20.61	6.00	10.00	3.00	1.00	3.00	1.00
8	MBZUAI Institute of Foundation Models	K2-V2 (low)	88.89	14.44	6.00	10.00	3.00	1.00	3.00	1.00
9	Allen Institute for AI	Olmo 3 32B Think	88.89	12.09	6.00	10.00	3.00	1.00	3.00	1.00
10	NVIDIA	NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	83.33	35.97	6.00	9.00	2.00	1.00	2.00	1.00
11	NVIDIA	NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)	83.33	24.27	6.00	9.00	2.00	1.00	2.00	1.00
12	NVIDIA	NVIDIA Nemotron Nano 9B V2 (Non-reasoning)	72.22	18.79	6.00	7.00	2.00	1.00	2.00	1.00
13	NVIDIA	NVIDIA Nemotron Nano 9B V2 (Reasoning)	72.22	14.76	6.00	7.00	2.00	1.00	2.00	1.00
14	NVIDIA	NVIDIA Nemotron Nano 12B v2 VL (Reasoning)	72.22	14.89	6.00	7.00	2.00	1.00	2.00	1.00
15	NVIDIA	NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)	72.22	10.09	6.00	7.00	2.00	1.00	2.00	1.00
16	Allen Institute for AI	Molmo2-8B	72.22	-	6.00	7.00	3.00	1.00	3.00	1.00
17	Kimi	Kimi Linear 48B A3B Instruct	61.11	14.41	6.00	5.00	1.00	0.00	1.00	0.00
18	Baidu	ERNIE 4.5 300B A47B	55.56	14.96	6.00	4.00	0.00	0.00	0.00	0.00
19	Z AI	GLM-4.5 (Reasoning)	55.56	26.42	6.00	4.00	0.00	0.00	0.00	0.00
20	Z AI	GLM-4.5-Air	55.56	23.17	6.00	4.00	0.00	0.00	0.00	0.00
21	NVIDIA	Llama Nemotron Super 49B v1.5 (Non-reasoning)	52.78	14.59	4.00	5.50	1.00	0.00	1.00	1.00
22	NVIDIA	Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	52.78	15.02	4.00	5.50	1.00	0.00	1.00	1.00
23	NVIDIA	Llama Nemotron Super 49B v1.5 (Reasoning)	52.78	18.68	4.00	5.50	1.00	0.00	1.00	1.00
24	NVIDIA	Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)	52.78	14.43	4.00	5.50	1.00	0.00	1.00	1.00
25	NVIDIA	Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)	52.78	14.35	4.00	5.50	1.00	0.00	1.00	1.00
26	NVIDIA	Llama 3.3 Nemotron Super 49B v1 (Reasoning)	52.78	18.49	4.00	5.50	1.00	0.00	1.00	1.00
27	Xiaomi	MiMo-V2-Flash (Reasoning)	52.78	39.24	6.00	3.50	0.00	0.00	1.00	0.00
28	Z AI	GLM-4.5V (Non-reasoning)	52.78	12.74	6.00	3.50	1.00	0.00	0.00	0.00
29	Z AI	GLM-4.5V (Reasoning)	52.78	15.09	6.00	3.50	1.00	0.00	0.00	0.00
30	Google	Gemma 3 12B Instruct	50.00	8.79	6.00	3.00	0.00	0.00	0.00	0.00
31	Google	Gemma 3 27B Instruct	50.00	10.31	6.00	3.00	0.00	0.00	0.00	0.00
32	Google	Gemma 3 4B Instruct	50.00	6.30	6.00	3.00	0.00	0.00	0.00	0.00
33	Google	Gemma 3 1B Instruct	50.00	5.55	6.00	3.00	0.00	0.00	0.00	0.00
34	Google	Gemma 3n E4B Instruct	50.00	6.38	6.00	3.00	0.00	0.00	0.00	0.00
35	Google	Gemma 3n E2B Instruct	50.00	4.76	6.00	3.00	0.00	0.00	0.00	0.00
36	Mistral	Magistral Small 1.2	50.00	18.16	6.00	3.00	0.00	0.00	1.00	1.00
37	DeepSeek	DeepSeek R1 0528 (May '25)	50.00	27.07	6.00	3.00	0.00	0.00	0.00	0.00
38	Microsoft Azure	Phi-4	50.00	10.41	6.00	3.00	1.00	0.00	1.00	0.00
39	Microsoft Azure	Phi-4 Mini Instruct	50.00	10.94	6.00	3.00	1.00	0.00	1.00	0.00
40	Microsoft Azure	Phi-4 Multimodal Instruct	50.00	10.04	6.00	3.00	1.00	0.00	1.00	0.00
41	StepFun	Step 3.5 Flash	50.00	37.80	6.00	3.00	0.00	0.00	0.00	0.00
42	Z AI	GLM-5 (Reasoning)	50.00	49.77	6.00	3.00	0.00	0.00	0.00	0.00
43	Alibaba	Qwen3 VL 30B A3B (Reasoning)	50.00	19.68	6.00	3.00	1.00	0.00	1.00	0.00
44	Alibaba	Qwen3 VL 4B Instruct	50.00	9.55	6.00	3.00	1.00	0.00	1.00	0.00
45	Alibaba	Qwen3 VL 8B Instruct	50.00	14.30	6.00	3.00	1.00	0.00	1.00	0.00
46	Alibaba	Qwen3 VL 4B (Reasoning)	50.00	13.73	6.00	3.00	1.00	0.00	1.00	0.00
47	Alibaba	Qwen3 VL 8B (Reasoning)	50.00	16.66	6.00	3.00	1.00	0.00	1.00	0.00
48	Alibaba	Qwen3 VL 235B A22B (Reasoning)	50.00	27.64	6.00	3.00	1.00	0.00	1.00	0.00
49	Alibaba	Qwen3 VL 235B A22B Instruct	50.00	20.75	6.00	3.00	1.00	0.00	1.00	0.00
50	Alibaba	Qwen3 VL 30B A3B Instruct	50.00	16.05	6.00	3.00	1.00	0.00	1.00	0.00
51	Alibaba	Qwen3 VL 32B Instruct	50.00	17.19	6.00	3.00	1.00	0.00	1.00	0.00
52	Alibaba	Qwen3 VL 32B (Reasoning)	50.00	24.72	6.00	3.00	1.00	0.00	1.00	0.00
53	DeepSeek	DeepSeek R1 0528 Qwen3 8B	47.22	16.43	6.00	2.50	0.00	0.00	1.00	0.00
54	Nous Research	Hermes 4 - Llama-3.1 70B (Reasoning)	47.22	15.99	4.00	4.50	1.00	0.00	2.00	0.00
55	Nous Research	Hermes 4 - Llama-3.1 70B (Non-reasoning)	47.22	12.63	4.00	4.50	1.00	0.00	2.00	0.00
56	Nous Research	Hermes 4 - Llama-3.1 405B (Non-reasoning)	47.22	17.63	4.00	4.50	1.00	0.00	2.00	0.00
57	Nous Research	Hermes 4 - Llama-3.1 405B (Reasoning)	47.22	18.56	4.00	4.50	1.00	0.00	2.00	0.00
58	ServiceNow	Apriel-v1.5-15B-Thinker	47.22	28.33	6.00	2.50	0.00	0.00	1.00	0.00
59	Google	Gemma 3 270M	44.44	7.71	6.00	2.00	0.00	0.00	0.00	0.00
60	TII UAE	Falcon-H1R-7B	44.44	15.80	4.00	4.00	1.00	0.00	1.00	0.00
61	NVIDIA	Llama 3.1 Nemotron Instruct 70B	44.44	13.44	4.00	4.00	0.00	0.00	1.00	1.00
62	Z AI	GLM-4.7-Flash (Reasoning)	44.44	30.15	6.00	2.00	0.00	0.00	0.00	0.00
63	Z AI	GLM-4.7-Flash (Non-reasoning)	44.44	22.07	6.00	2.00	0.00	0.00	0.00	0.00
64	Alibaba	Qwen3 Coder 30B A3B Instruct	44.44	19.98	6.00	2.00	0.00	0.00	0.00	0.00
65	Alibaba	Qwen3 Next 80B A3B (Reasoning)	44.44	26.72	6.00	2.00	0.00	0.00	0.00	0.00
66	Alibaba	Qwen3 Next 80B A3B Instruct	44.44	20.11	6.00	2.00	0.00	0.00	0.00	0.00
67	Alibaba	Qwen3 235B A22B 2507 Instruct	44.44	24.96	6.00	2.00	0.00	0.00	0.00	0.00
68	Alibaba	Qwen3 Coder 480B A35B Instruct	44.44	24.77	6.00	2.00	0.00	0.00	0.00	0.00
69	Alibaba	Qwen3 235B A22B 2507 (Reasoning)	44.44	29.54	6.00	2.00	0.00	0.00	0.00	0.00
70	Alibaba	Qwen3 30B A3B 2507 (Reasoning)	44.44	22.41	6.00	2.00	0.00	0.00	0.00	0.00
71	Alibaba	Qwen3 30B A3B 2507 Instruct	44.44	15.00	6.00	2.00	0.00	0.00	0.00	0.00
72	Alibaba	Qwen3 Omni 30B A3B Instruct	44.44	10.68	6.00	2.00	0.00	0.00	0.00	0.00
73	Alibaba	Qwen3 4B 2507 (Reasoning)	44.44	18.18	6.00	2.00	0.00	0.00	0.00	0.00
74	Alibaba	Qwen3 Omni 30B A3B (Reasoning)	44.44	15.62	6.00	2.00	0.00	0.00	0.00	0.00
75	Alibaba	Qwen3 4B 2507 Instruct	44.44	12.88	6.00	2.00	0.00	0.00	0.00	0.00
76	InclusionAI	Ling-mini-2.0	44.44	9.19	6.00	2.00	0.00	0.00	0.00	0.00
77	InclusionAI	Ling-flash-2.0	44.44	15.74	6.00	2.00	0.00	0.00	0.00	0.00
78	InclusionAI	Ling-1T	44.44	19.04	6.00	2.00	0.00	0.00	0.00	0.00
79	Mistral	Devstral Small (Jul '25)	44.44	15.21	6.00	2.00	0.00	0.00	0.00	0.00
80	DeepSeek	DeepSeek V3.2 Exp (Non-reasoning)	44.44	28.44	6.00	2.00	0.00	0.00	0.00	0.00
81	DeepSeek	DeepSeek V3.2 Exp (Reasoning)	44.44	32.94	6.00	2.00	0.00	0.00	0.00	0.00
82	Kimi	Kimi K2	44.44	26.32	4.00	4.00	1.00	0.00	1.00	0.00
83	Z AI	GLM-4.6 (Reasoning)	44.44	32.51	6.00	2.00	0.00	0.00	0.00	0.00
84	Z AI	GLM-4.7 (Non-reasoning)	44.44	34.16	6.00	2.00	0.00	0.00	0.00	0.00
85	Z AI	GLM-4.7 (Reasoning)	44.44	42.11	6.00	2.00	0.00	0.00	0.00	0.00
86	Z AI	GLM-4.6 (Non-reasoning)	44.44	30.24	6.00	2.00	0.00	0.00	0.00	0.00
87	ByteDance Seed	Seed-OSS-36B-Instruct	44.44	25.16	6.00	2.00	0.00	0.00	0.00	0.00
88	IBM	Granite 4.0 H 1B	41.67	7.99	6.00	1.50	1.00	0.00	0.00	0.00
89	IBM	Granite 4.0 350M	41.67	6.10	6.00	1.50	1.00	0.00	0.00	0.00
90	IBM	Granite 4.0 1B	41.67	7.34	6.00	1.50	1.00	0.00	0.00	0.00
91	IBM	Granite 4.0 H Small	41.67	10.81	6.00	1.50	1.00	0.00	0.00	0.00
92	IBM	Granite 4.0 H 350M	41.67	5.44	6.00	1.50	1.00	0.00	0.00	0.00
93	IBM	Granite 4.0 Micro	41.67	7.67	6.00	1.50	1.00	0.00	0.00	0.00
94	Alibaba	Qwen3 Coder Next	41.67	28.28	6.00	1.50	0.00	0.00	1.00	0.00
95	OpenAI	gpt-oss-120B (high)	38.89	33.27	6.00	1.00	0.00	0.00	0.00	0.00
96	OpenAI	gpt-oss-20B (high)	38.89	24.47	6.00	1.00	0.00	0.00	0.00	0.00
97	Meta	Llama 3.3 Instruct 70B	38.89	14.49	4.00	3.00	1.00	0.00	1.00	0.00
98	Meta	Llama 3.1 Instruct 405B	38.89	17.38	4.00	3.00	1.00	0.00	1.00	0.00
99	Meta	Llama 3.2 Instruct 90B (Vision)	38.89	11.90	4.00	3.00	1.00	0.00	1.00	0.00
100	Meta	Llama 3.2 Instruct 11B (Vision)	38.89	8.73	4.00	3.00	1.00	0.00	1.00	0.00
101	Mistral	Mistral Large 3	38.89	22.80	6.00	1.00	0.00	0.00	0.00	0.00
102	Mistral	Mistral Small 3.2	38.89	15.07	6.00	1.00	0.00	0.00	0.00	0.00
103	Perplexity	R1 1776	38.89	11.99	6.00	1.00	0.00	0.00	0.00	0.00
104	Reka AI	Reka Flash 3	38.89	9.52	6.00	1.00	0.00	0.00	0.00	0.00
105	Nous Research	DeepHermes 3 - Mistral 24B Preview (Non-reasoning)	38.89	10.89	6.00	1.00	0.00	0.00	0.00	0.00
106	Deep Cogito	Cogito v2.1 (Reasoning)	38.89	-	6.00	1.00	0.00	0.00	0.00	0.00
107	AI21 Labs	Jamba Reasoning 3B	38.89	9.60	6.00	1.00	0.00	0.00	0.00	0.00
108	Alibaba	Qwen3.5 397B A17B (Reasoning)	38.89	45.05	6.00	1.00	0.00	0.00	0.00	0.00
109	Alibaba	Qwen3.5 27B (Non-reasoning)	38.89	37.18	6.00	1.00	0.00	0.00	0.00	0.00
110	Alibaba	Qwen3.5 397B A17B (Non-reasoning)	38.89	40.10	6.00	1.00	0.00	0.00	0.00	0.00
111	Alibaba	Qwen3.5 122B A10B (Non-reasoning)	38.89	35.87	6.00	1.00	0.00	0.00	0.00	0.00
112	Alibaba	Qwen3.5 35B A3B (Non-reasoning)	38.89	30.69	6.00	1.00	0.00	0.00	0.00	0.00
113	Alibaba	Qwen3.5 2B (Non-reasoning)	38.89	14.67	6.00	1.00	0.00	0.00	0.00	0.00
114	Alibaba	Qwen3.5 2B (Reasoning)	38.89	16.29	6.00	1.00	0.00	0.00	0.00	0.00
115	Alibaba	Qwen3.5 4B (Non-reasoning)	38.89	22.60	6.00	1.00	0.00	0.00	0.00	0.00
116	Alibaba	Qwen3.5 0.8B (Non-reasoning)	38.89	9.91	6.00	1.00	0.00	0.00	0.00	0.00
117	Alibaba	Qwen3.5 4B (Reasoning)	38.89	27.08	6.00	1.00	0.00	0.00	0.00	0.00
118	Alibaba	Qwen3.5 122B A10B (Reasoning)	38.89	41.60	6.00	1.00	0.00	0.00	0.00	0.00
119	Alibaba	Qwen3.5 35B A3B (Reasoning)	38.89	37.12	6.00	1.00	0.00	0.00	0.00	0.00
120	Alibaba	Qwen3.5 9B (Reasoning)	38.89	32.43	6.00	1.00	0.00	0.00	0.00	0.00
121	Alibaba	Qwen3.5 0.8B (Reasoning)	38.89	10.52	6.00	1.00	0.00	0.00	0.00	0.00
122	Alibaba	Qwen3.5 9B (Non-reasoning)	38.89	27.33	6.00	1.00	0.00	0.00	0.00	0.00
123	Alibaba	Qwen3.5 27B (Reasoning)	38.89	42.07	6.00	1.00	0.00	0.00	0.00	0.00
124	InclusionAI	Ring-flash-2.0	38.89	14.02	6.00	1.00	0.00	0.00	0.00	0.00
125	InclusionAI	Ring-1T	38.89	22.78	6.00	1.00	0.00	0.00	0.00	0.00
126	DeepSeek	DeepSeek V3.1 Terminus (Reasoning)	38.89	33.93	6.00	1.00	0.00	0.00	0.00	0.00
127	DeepSeek	DeepSeek V3.1 Terminus (Non-reasoning)	38.89	28.52	6.00	1.00	0.00	0.00	0.00	0.00
128	DeepSeek	DeepSeek R1 Distill Llama 70B	36.11	15.95	4.00	2.50	0.00	0.00	1.00	0.00
129	Liquid AI	LFM2 2.6B	33.33	8.04	4.00	2.00	0.00	0.00	0.00	0.00
130	Liquid AI	LFM2 8B A1B	33.33	7.03	4.00	2.00	0.00	0.00	0.00	0.00
131	Kimi	Kimi K2.5 (Reasoning)	33.33	46.81	4.00	2.00	0.00	0.00	0.00	0.00
132	Nous Research	DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)	33.33	7.58	5.00	1.00	0.00	0.00	0.00	0.00
133	Cohere	Command A	33.33	13.48	3.00	3.00	0.00	0.00	0.00	0.00
134	Liquid AI	LFM2 1.2B	33.33	6.33	4.00	2.00	0.00	0.00	0.00	0.00
135	Naver	HyperCLOVA X SEED Think (32B)	30.56	23.72	4.00	1.50	1.00	0.00	0.00	0.00
136	Meta	Llama 4 Scout	27.78	13.52	4.00	1.00	0.00	0.00	0.00	0.00
137	Meta	Llama 4 Maverick	27.78	18.36	4.00	1.00	0.00	0.00	0.00	0.00
138	Mistral	Magistral Medium 1.2	27.78	27.10	2.00	3.00	0.00	0.00	1.00	1.00
139	Liquid AI	LFM2.5-1.2B-Instruct	27.78	8.04	4.00	1.00	0.00	0.00	0.00	0.00
140	Liquid AI	LFM2.5-1.2B-Thinking	27.78	8.08	4.00	1.00	0.00	0.00	0.00	0.00
141	Liquid AI	LFM2.5-VL-1.6B	27.78	6.18	4.00	1.00	0.00	0.00	0.00	0.00
142	MiniMax	MiniMax-M2.5	27.78	41.93	4.00	1.00	0.00	0.00	0.00	0.00
143	LG AI Research	Exaone 4.0 1.2B (Non-reasoning)	27.78	8.11	3.00	2.00	0.00	0.00	0.00	0.00
144	LG AI Research	EXAONE 4.0 32B (Reasoning)	27.78	16.68	3.00	2.00	0.00	0.00	0.00	0.00
145	LG AI Research	K-EXAONE (Reasoning)	27.78	32.12	4.00	1.00	0.00	0.00	0.00	0.00
146	LG AI Research	EXAONE 4.0 32B (Non-reasoning)	27.78	11.66	3.00	2.00	0.00	0.00	0.00	0.00
147	LG AI Research	Exaone 4.0 1.2B (Reasoning)	27.78	8.26	3.00	2.00	0.00	0.00	0.00	0.00
148	MiniMax	MiniMax-M2.1	27.78	39.42	4.00	1.00	0.00	0.00	0.00	0.00
149	MiniMax	MiniMax-M2	27.78	36.09	4.00	1.00	0.00	0.00	0.00	0.00
150	Kimi	Kimi K2 Thinking	27.78	40.89	4.00	1.00	0.00	0.00	0.00	0.00
151	Kimi	Kimi K2 0905	27.78	30.85	4.00	1.00	0.00	0.00	0.00	0.00
152	AI21 Labs	Jamba 1.7 Mini	22.22	8.07	4.00	0.00	0.00	0.00	0.00	0.00
153	AI21 Labs	Jamba 1.7 Large	22.22	10.88	4.00	0.00	0.00	0.00	0.00	0.00
154	Alibaba	Qwen3 Max Thinking (Preview)	16.67	32.48	2.00	1.00	0.00	0.00	0.00	0.00
155	Alibaba	Qwen3 Max	16.67	31.38	2.00	1.00	0.00	0.00	0.00	0.00
156	Google	Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)	11.11	19.42	2.00	0.00	0.00	0.00	0.00	0.00
157	Anthropic	Claude 4.5 Haiku (Non-reasoning)	11.11	31.05	2.00	0.00	0.00	0.00	0.00	0.00
158	Anthropic	Claude 4.5 Haiku (Reasoning)	11.11	37.09	2.00	0.00	0.00	0.00	0.00	0.00
159	Mistral	Mistral Medium 3.1	11.11	21.25	2.00	0.00	0.00	0.00	0.00	0.00
160	xAI	Grok 3 mini Reasoning (high)	11.11	32.08	2.00	0.00	0.00	0.00	0.00	0.00
161	xAI	Grok 4.1 Fast (Non-reasoning)	11.11	23.56	2.00	0.00	0.00	0.00	0.00	0.00
162	Amazon	Nova Micro	11.11	10.27	2.00	0.00	0.00	0.00	0.00	0.00
163	Amazon	Nova Premier	11.11	19.01	2.00	0.00	0.00	0.00	0.00	0.00
164	Upstage	Solar Pro 2 (Reasoning)	11.11	14.92	2.00	0.00	0.00	0.00	0.00	0.00
165	Upstage	Solar Pro 2 (Non-reasoning)	11.11	13.59	2.00	0.00	0.00	0.00	0.00	0.00
166	ByteDance Seed	Doubao Seed Code	11.11	33.52	2.00	0.00	0.00	0.00	0.00	0.00
167	OpenAI	GPT-5.1 (Non-reasoning)	11.11	27.42	2.00	0.00	0.00	0.00	0.00	0.00
168	OpenAI	GPT-5 (ChatGPT)	11.11	21.83	2.00	0.00	0.00	0.00	0.00	0.00
169	Google	Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)	11.11	25.70	2.00	0.00	0.00	0.00	0.00	0.00
170	Anthropic	Claude Opus 4.5 (Non-reasoning)	11.11	43.09	2.00	0.00	0.00	0.00	0.00	0.00
171	Anthropic	Claude 4.5 Sonnet (Non-reasoning)	11.11	37.14	2.00	0.00	0.00	0.00	0.00	0.00
172	Anthropic	Claude 4.5 Sonnet (Reasoning)	11.11	43.03	2.00	0.00	0.00	0.00	0.00	0.00
173	Anthropic	Claude Opus 4.5 (Reasoning)	11.11	49.73	2.00	0.00	0.00	0.00	0.00	0.00
174	Mistral	Devstral Medium	11.11	18.66	2.00	0.00	0.00	0.00	0.00	0.00
175	xAI	Grok 4 Fast (Non-reasoning)	11.11	23.12	2.00	0.00	0.00	0.00	0.00	0.00
176	Amazon	Nova Pro	11.11	13.48	2.00	0.00	0.00	0.00	0.00	0.00
177	Amazon	Nova Lite	11.11	12.65	2.00	0.00	0.00	0.00	0.00	0.00
178	OpenAI	o3	5.56	38.37	1.00	0.00	0.00	0.00	0.00	0.00
179	OpenAI	GPT-5 nano (high)	5.56	26.83	1.00	0.00	0.00	0.00	0.00	0.00
180	OpenAI	GPT-5 mini (high)	5.56	41.17	1.00	0.00	0.00	0.00	0.00	0.00
181	Google	Gemini 2.5 Pro	5.56	34.63	1.00	0.00	0.00	0.00	0.00	0.00
182	Google	Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)	5.56	21.65	1.00	0.00	0.00	0.00	0.00	0.00
183	xAI	Grok 4	5.56	41.52	1.00	0.00	0.00	0.00	0.00	0.00
184	xAI	Grok Code Fast 1	5.56	28.74	1.00	0.00	0.00	0.00	0.00	0.00
185	xAI	Grok 4.1 Fast (Reasoning)	5.56	38.61	1.00	0.00	0.00	0.00	0.00	0.00
186	OpenAI	GPT-5.1 (high)	5.56	47.70	1.00	0.00	0.00	0.00	0.00	0.00
187	OpenAI	GPT-5 (high)	5.56	44.63	1.00	0.00	0.00	0.00	0.00	0.00
188	OpenAI	GPT-5 mini (medium)	5.56	38.94	1.00	0.00	0.00	0.00	0.00	0.00
189	OpenAI	GPT-5 nano (medium)	5.56	25.88	1.00	0.00	0.00	0.00	0.00	0.00
190	OpenAI	GPT-5 (minimal)	5.56	23.89	1.00	0.00	0.00	0.00	0.00	0.00
191	OpenAI	GPT-5 mini (minimal)	5.56	20.68	1.00	0.00	0.00	0.00	0.00	0.00
192	OpenAI	GPT-5 nano (minimal)	5.56	13.84	1.00	0.00	0.00	0.00	0.00	0.00
193	OpenAI	GPT-5 (medium)	5.56	42.03	1.00	0.00	0.00	0.00	0.00	0.00
194	OpenAI	GPT-5 (low)	5.56	39.20	1.00	0.00	0.00	0.00	0.00	0.00
195	OpenAI	GPT-5 Codex (high)	5.56	44.63	1.00	0.00	0.00	0.00	0.00	0.00
196	Google	Gemini 3 Pro Preview (high)	5.56	48.39	1.00	0.00	0.00	0.00	0.00	0.00
197	Google	Gemini 2.5 Flash Preview (Sep '25) (Reasoning)	5.56	31.14	1.00	0.00	0.00	0.00	0.00	0.00
198	xAI	Grok 4 Fast (Reasoning)	5.56	35.06	1.00	0.00	0.00	0.00	0.00	0.00

Explore Evaluations

Artificial Analysis Intelligence Index

A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.

GDPval-AA Leaderboard

GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.

AA-Omniscience: Knowledge and Hallucination Benchmark

A benchmark measuring factual recall and hallucination across various economically relevant domains.

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

MMLU-Pro Benchmark Leaderboard

An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.

Global-MMLU-Lite Benchmark Leaderboard

A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.

GPQA Diamond Benchmark Leaderboard

The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.

Humanity's Last Exam Benchmark Leaderboard

A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.

LiveCodeBench Benchmark Leaderboard

A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.

SciCode Benchmark Leaderboard

A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.

MATH-500 Benchmark Leaderboard

A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.

IFBench Benchmark Leaderboard

A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.

AIME 2025 Benchmark Leaderboard

All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.

CritPt Benchmark Leaderboard

A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.

Terminal-Bench Hard Benchmark Leaderboard

An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.

𝜏²-Bench Telecom Benchmark Leaderboard

A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.

Artificial Analysis Long Context Reasoning Benchmark Leaderboard

A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).

MMMU-Pro Benchmark Leaderboard

An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.