Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

Background

The Artificial Analysis Openness Index assesses how 'open' models are on the basis of their availability and transparency across different components (e.g. models weights, training data, and model architecture).

Availability represents the ability to use a model via API, self-hosting through open weights, and use freely with permissive licensing. Transparency captures the degree to which a model's methodology and data have been disclosed, shared, and permissively licensed for the community to use to understand a model's inputs and replicate or build on its approach.

Methodology

All evaluations are conducted independently by Artificial Analysis. More information can be found on our Intelligence Benchmarking Methodology page.

Highlights

Olmo 3.1 32B Instruct scores the highest on Openness Index with a score of 89, followed by Olmo 3 7B Think with a score of 89, and Olmo 3 7B Instruct with a score of 89
o3 scores the lowest on Openness Index with a score of 6, followed by Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) with a score of 6, and Gemini 2.5 Pro with a score of 6

Artificial Analysis Openness Index: Results

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

+ Add model from specific provider

Artificial Analysis Openness Index: Components

Openness Index underlying score contribution by components, up to a maximum of 18 (higher is more open)

+ Add model from specific provider

Model Availability

Transparency - Methodology

Transparency - Post-training Data

Transparency - Pre-training Data

Artificial Analysis Openness Index: Model Availability vs. Model Transparency

Model Availability reflects the availability of a model for usage and associated license (maximum 6 points); Model Transparency reflects methodology and data disclosures, data sharing, and code and licensing associated with a model's training process (maximum 12 points)

+ Add model from specific provider

Most attractive quadrant

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Artificial Analysis Openness Index: Score vs. Release Date

Artificial Analysis Openness Index; Release Date

+ Add model from specific provider

Most attractive region

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index

Artificial Analysis Openness Index; Artificial Analysis Intelligence Index

+ Add model from specific provider

Most attractive quadrant

Alibaba

Allen Institute for AI

Anthropic

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Openness Index Composition

Detailed methodology

Dimension

Component

Scoring

1. Model availability

Weights

Access

0Closed weights, no API

1Closed weights, API limits token visibility

2Closed weights, API available

3Open weights

License

0Closed weights or no commercial use

1Commercial use, attribution required

2Commercial use, no attribution required

3Commercial use, no attribution required, no meaningful limitations

2. Model transparency

Data:Pre & Post Training(score represents average across each)

Access

0No or limited disclosure

1Partial data source detail and categorization disclosed

2Full data mix disclosure, substantial data shared¹

3Full data shared

License (most restrictive)

0No commercial use/no substantial data shared

1Commercial use, attribution required

2Commercial use, no attribution required

3Commercial use, no attribution required, no meaningful limitations

Methodology

Disclosure

0No or limited disclosure

1Model architecture disclosure

2Limited general technical disclosure

3Full technical details disclosed

License (most restrictive)

0No code disclosed/released

1Frameworks disclosed, openly available for commercial use

2End-to-end training pipeline code or guide released

3End-to-end training pipeline code or guide released, and commercial use allowed

Scoring methodology

Each component is scored on a 0-3 qualitative scale based on the best-fitting openness 'archetype', with each model assessed based on the full set of public first-party information available.

We synthesize these underlying factors into a unified metric, the Artificial Analysis Openness Index, as follows:

Data elements are averaged between pre- and post-training (to give a total of 6 possible points across data)
All component scores are added (up to a maximum of 18/18 points)
This score is normalized to a 0-100 scale

Where models are derived from a third-party base model, they may be constrained by the licensing or limited disclosure of the upstream model. For incremental/update releases, we only consider disclosures explicitly about the new release (including allowing model creators to declare which components remain consistent with an earlier release).

Openness Index Leaderboard


1	Allen Institute for AI	Olmo 3.1 32B Instruct	88.89	12.16	6.00	10.00	3.00	1.00	3.00	1.00
2	Allen Institute for AI	Olmo 3 7B Think	88.89	9.43	6.00	10.00	3.00	1.00	3.00	1.00
3	Allen Institute for AI	Olmo 3 7B Instruct	88.89	8.15	6.00	10.00	3.00	1.00	3.00	1.00
4	Allen Institute for AI	Molmo 7B-D	88.89	9.25	6.00	10.00	3.00	1.00	3.00	1.00
5	Allen Institute for AI	Olmo 3.1 32B Think	88.89	13.94	6.00	10.00	3.00	1.00	3.00	1.00
6	MBZUAI Institute of Foundation Models	K2-V2 (high)	88.89	20.61	6.00	10.00	3.00	1.00	3.00	1.00
7	MBZUAI Institute of Foundation Models	K2-V2 (low)	88.89	14.44	6.00	10.00	3.00	1.00	3.00	1.00
8	MBZUAI Institute of Foundation Models	K2 Think V2	88.89	24.12	6.00	10.00	3.00	1.00	3.00	1.00
9	MBZUAI Institute of Foundation Models	K2-V2 (medium)	88.89	18.68	6.00	10.00	3.00	1.00	3.00	1.00
10	Swiss AI Initiative	Apertus 70B Instruct	88.89	7.70	6.00	10.00	3.00	1.00	3.00	1.00
11	Swiss AI Initiative	Apertus 8B Instruct	88.89	5.88	6.00	10.00	3.00	1.00	3.00	1.00
12	Allen Institute for AI	Olmo 3 32B Think	88.89	12.09	6.00	10.00	3.00	1.00	3.00	1.00
13	Allen Institute for AI	OLMo 2 7B	88.89	9.30	6.00	10.00	3.00	1.00	3.00	1.00
14	Allen Institute for AI	OLMo 2 32B	88.89	10.57	6.00	10.00	3.00	1.00	3.00	1.00
15	NVIDIA	NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	83.33	35.97	6.00	9.00	2.00	1.00	2.00	1.00
16	NVIDIA	NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)	83.33	24.27	6.00	9.00	2.00	1.00	2.00	1.00
17	NVIDIA	NVIDIA Nemotron Nano 9B V2 (Non-reasoning)	72.22	13.16	6.00	7.00	2.00	1.00	2.00	1.00
18	NVIDIA	NVIDIA Nemotron Nano 12B v2 VL (Reasoning)	72.22	14.89	6.00	7.00	2.00	1.00	2.00	1.00
19	NVIDIA	NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)	72.22	10.09	6.00	7.00	2.00	1.00	2.00	1.00
20	NVIDIA	NVIDIA Nemotron Nano 9B V2 (Reasoning)	72.22	14.76	6.00	7.00	2.00	1.00	2.00	1.00
21	Allen Institute for AI	Molmo2-8B	72.22	7.30	6.00	7.00	3.00	1.00	3.00	1.00
22	Kimi	Kimi Linear 48B A3B Instruct	61.11	14.41	6.00	5.00	1.00	0.00	1.00	0.00
23	IBM	Granite 4.0 H 1B	55.56	7.99	6.00	4.00	2.00	1.00	2.00	1.00
24	IBM	Granite 4.0 1B	55.56	7.34	6.00	4.00	2.00	1.00	2.00	1.00
25	IBM	Granite 4.0 Micro	55.56	7.67	6.00	4.00	2.00	1.00	2.00	1.00
26	IBM	Granite 4.0 H 350M	55.56	5.44	6.00	4.00	2.00	1.00	2.00	1.00
27	IBM	Granite 4.0 350M	55.56	6.10	6.00	4.00	2.00	1.00	2.00	1.00
28	IBM	Granite 4.0 H Small	55.56	10.81	6.00	4.00	2.00	1.00	2.00	1.00
29	Baidu	ERNIE 4.5 300B A47B	55.56	14.96	6.00	4.00	0.00	0.00	0.00	0.00
30	Z AI	GLM-4.5-Air	55.56	23.17	6.00	4.00	0.00	0.00	0.00	0.00
31	Z AI	GLM-4.5 (Reasoning)	55.56	26.42	6.00	4.00	0.00	0.00	0.00	0.00
32	NVIDIA	Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)	52.78	14.43	4.00	5.50	1.00	0.00	1.00	1.00
33	NVIDIA	Llama Nemotron Super 49B v1.5 (Non-reasoning)	52.78	14.59	4.00	5.50	1.00	0.00	1.00	1.00
34	NVIDIA	Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	52.78	15.02	4.00	5.50	1.00	0.00	1.00	1.00
35	NVIDIA	Llama 3.3 Nemotron Super 49B v1 (Reasoning)	52.78	18.49	4.00	5.50	1.00	0.00	1.00	1.00
36	NVIDIA	Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)	52.78	14.35	4.00	5.50	1.00	0.00	1.00	1.00
37	NVIDIA	Llama Nemotron Super 49B v1.5 (Reasoning)	52.78	18.68	4.00	5.50	1.00	0.00	1.00	1.00
38	Xiaomi	MiMo-V2-Flash (Reasoning)	52.78	39.24	6.00	3.50	0.00	0.00	1.00	0.00
39	Z AI	GLM-4.5V (Non-reasoning)	52.78	12.74	6.00	3.50	1.00	0.00	0.00	0.00
40	Z AI	GLM-4.5V (Reasoning)	52.78	15.09	6.00	3.50	1.00	0.00	0.00	0.00
41	Google	Gemma 3n E4B Instruct	50.00	6.38	6.00	3.00	0.00	0.00	0.00	0.00
42	Google	Gemma 3 12B Instruct	50.00	8.79	6.00	3.00	0.00	0.00	0.00	0.00
43	Google	Gemma 3 4B Instruct	50.00	6.30	6.00	3.00	0.00	0.00	0.00	0.00
44	Google	Gemma 3 27B Instruct	50.00	10.31	6.00	3.00	0.00	0.00	0.00	0.00
45	Google	Gemma 3 1B Instruct	50.00	5.55	6.00	3.00	0.00	0.00	0.00	0.00
46	Google	Gemma 3n E2B Instruct	50.00	4.76	6.00	3.00	0.00	0.00	0.00	0.00
47	Mistral	Magistral Small 1.2	50.00	18.16	6.00	3.00	0.00	0.00	1.00	1.00
48	DeepSeek	DeepSeek R1 0528 (May '25)	50.00	27.07	6.00	3.00	0.00	0.00	0.00	0.00
49	Microsoft Azure	Phi-4	50.00	10.41	6.00	3.00	1.00	0.00	1.00	0.00
50	Microsoft Azure	Phi-4 Mini Instruct	50.00	8.39	6.00	3.00	1.00	0.00	1.00	0.00
51	Microsoft Azure	Phi-4 Multimodal Instruct	50.00	10.04	6.00	3.00	1.00	0.00	1.00	0.00
52	StepFun	Step 3.5 Flash	50.00	37.80	6.00	3.00	0.00	0.00	0.00	0.00
53	Z AI	GLM-5 (Reasoning)	50.00	49.77	6.00	3.00	0.00	0.00	0.00	0.00
54	Alibaba	Qwen3 VL 30B A3B Instruct	50.00	16.05	6.00	3.00	1.00	0.00	1.00	0.00
55	Alibaba	Qwen3 VL 30B A3B (Reasoning)	50.00	19.68	6.00	3.00	1.00	0.00	1.00	0.00
56	Alibaba	Qwen3 VL 32B (Reasoning)	50.00	24.72	6.00	3.00	1.00	0.00	1.00	0.00
57	Alibaba	Qwen3 VL 32B Instruct	50.00	17.19	6.00	3.00	1.00	0.00	1.00	0.00
58	Alibaba	Qwen3 VL 235B A22B (Reasoning)	50.00	27.64	6.00	3.00	1.00	0.00	1.00	0.00
59	Alibaba	Qwen3 VL 8B (Reasoning)	50.00	16.66	6.00	3.00	1.00	0.00	1.00	0.00
60	Alibaba	Qwen3 VL 235B A22B Instruct	50.00	20.75	6.00	3.00	1.00	0.00	1.00	0.00
61	Alibaba	Qwen3 VL 4B Instruct	50.00	9.55	6.00	3.00	1.00	0.00	1.00	0.00
62	Alibaba	Qwen3 VL 4B (Reasoning)	50.00	13.73	6.00	3.00	1.00	0.00	1.00	0.00
63	Alibaba	Qwen3 VL 8B Instruct	50.00	14.30	6.00	3.00	1.00	0.00	1.00	0.00
64	DeepSeek	DeepSeek R1 0528 Qwen3 8B	47.22	16.43	6.00	2.50	0.00	0.00	1.00	0.00
65	Nous Research	Hermes 4 - Llama-3.1 70B (Reasoning)	47.22	15.99	4.00	4.50	1.00	0.00	2.00	0.00
66	Nous Research	Hermes 4 - Llama-3.1 405B (Reasoning)	47.22	18.56	4.00	4.50	1.00	0.00	2.00	0.00
67	Nous Research	Hermes 4 - Llama-3.1 70B (Non-reasoning)	47.22	12.63	4.00	4.50	1.00	0.00	2.00	0.00
68	Nous Research	Hermes 4 - Llama-3.1 405B (Non-reasoning)	47.22	17.63	4.00	4.50	1.00	0.00	2.00	0.00
69	ServiceNow	Apriel-v1.5-15B-Thinker	47.22	28.33	6.00	2.50	0.00	0.00	1.00	0.00
70	Google	Gemma 3 270M	44.44	7.71	6.00	2.00	0.00	0.00	0.00	0.00
71	TII UAE	Falcon-H1R-7B	44.44	15.80	4.00	4.00	1.00	0.00	1.00	0.00
72	NVIDIA	Llama 3.1 Nemotron Instruct 70B	44.44	13.44	4.00	4.00	0.00	0.00	1.00	1.00
73	LongCat	LongCat Flash Lite	44.44	23.93	6.00	2.00	0.00	0.00	0.00	0.00
74	Alibaba	Qwen3 Next 80B A3B (Reasoning)	44.44	26.72	6.00	2.00	0.00	0.00	0.00	0.00
75	Alibaba	Qwen3 Coder 480B A35B Instruct	44.44	24.77	6.00	2.00	0.00	0.00	0.00	0.00
76	Alibaba	Qwen3 Next 80B A3B Instruct	44.44	20.11	6.00	2.00	0.00	0.00	0.00	0.00
77	Alibaba	Qwen3 Omni 30B A3B Instruct	44.44	10.68	6.00	2.00	0.00	0.00	0.00	0.00
78	Alibaba	Qwen3 Omni 30B A3B (Reasoning)	44.44	15.62	6.00	2.00	0.00	0.00	0.00	0.00
79	InclusionAI	Ling-flash-2.0	44.44	15.74	6.00	2.00	0.00	0.00	0.00	0.00
80	InclusionAI	Ling-mini-2.0	44.44	9.19	6.00	2.00	0.00	0.00	0.00	0.00
81	InclusionAI	Ling-1T	44.44	19.04	6.00	2.00	0.00	0.00	0.00	0.00
82	Mistral	Devstral Small (Jul '25)	44.44	15.21	6.00	2.00	0.00	0.00	0.00	0.00
83	DeepSeek	DeepSeek V3.2 Exp (Reasoning)	44.44	32.94	6.00	2.00	0.00	0.00	0.00	0.00
84	DeepSeek	DeepSeek V3.2 Exp (Non-reasoning)	44.44	28.44	6.00	2.00	0.00	0.00	0.00	0.00
85	Kimi	Kimi K2	44.44	26.32	4.00	4.00	1.00	0.00	1.00	0.00
86	Z AI	GLM-4.7 (Reasoning)	44.44	42.11	6.00	2.00	0.00	0.00	0.00	0.00
87	Z AI	GLM-4.7 (Non-reasoning)	44.44	34.16	6.00	2.00	0.00	0.00	0.00	0.00
88	Z AI	GLM-4.6 (Reasoning)	44.44	32.51	6.00	2.00	0.00	0.00	0.00	0.00
89	Z AI	GLM-4.7-Flash (Reasoning)	44.44	30.15	6.00	2.00	0.00	0.00	0.00	0.00
90	Z AI	GLM-4.7-Flash (Non-reasoning)	44.44	22.07	6.00	2.00	0.00	0.00	0.00	0.00
91	Z AI	GLM-4.6 (Non-reasoning)	44.44	30.24	6.00	2.00	0.00	0.00	0.00	0.00
92	Alibaba	Qwen3 Coder 30B A3B Instruct	44.44	19.98	6.00	2.00	0.00	0.00	0.00	0.00
93	Alibaba	Qwen3 30B A3B 2507 Instruct	44.44	15.00	6.00	2.00	0.00	0.00	0.00	0.00
94	Alibaba	Qwen3 30B A3B 2507 (Reasoning)	44.44	22.41	6.00	2.00	0.00	0.00	0.00	0.00
95	Alibaba	Qwen3 235B A22B 2507 (Reasoning)	44.44	29.54	6.00	2.00	0.00	0.00	0.00	0.00
96	Alibaba	Qwen3 235B A22B 2507 Instruct	44.44	24.96	6.00	2.00	0.00	0.00	0.00	0.00
97	Alibaba	Qwen3 4B 2507 (Reasoning)	44.44	18.18	6.00	2.00	0.00	0.00	0.00	0.00
98	Alibaba	Qwen3 4B 2507 Instruct	44.44	12.88	6.00	2.00	0.00	0.00	0.00	0.00
99	ByteDance Seed	Seed-OSS-36B-Instruct	44.44	25.16	6.00	2.00	0.00	0.00	0.00	0.00
100	Alibaba	Qwen3 Coder Next	41.67	28.28	6.00	1.50	0.00	0.00	1.00	0.00
101	OpenAI	gpt-oss-120B (high)	38.89	33.27	6.00	1.00	0.00	0.00	0.00	0.00
102	OpenAI	gpt-oss-20B (high)	38.89	24.47	6.00	1.00	0.00	0.00	0.00	0.00
103	Meta	Llama 3.3 Instruct 70B	38.89	14.49	4.00	3.00	1.00	0.00	1.00	0.00
104	Meta	Llama 3.1 Instruct 405B	38.89	17.38	4.00	3.00	1.00	0.00	1.00	0.00
105	Meta	Llama 3.2 Instruct 90B (Vision)	38.89	11.90	4.00	3.00	1.00	0.00	1.00	0.00
106	Meta	Llama 3.2 Instruct 11B (Vision)	38.89	8.73	4.00	3.00	1.00	0.00	1.00	0.00
107	Mistral	Mistral Small 4 (Non-reasoning)	38.89	18.62	6.00	1.00	0.00	0.00	0.00	0.00
108	Mistral	Mistral Large 3	38.89	22.80	6.00	1.00	0.00	0.00	0.00	0.00
109	Mistral	Mistral Small 4 (Reasoning)	38.89	27.19	6.00	1.00	0.00	0.00	0.00	0.00
110	Perplexity	R1 1776	38.89	11.99	6.00	1.00	0.00	0.00	0.00	0.00
111	Reka AI	Reka Flash 3	38.89	9.52	6.00	1.00	0.00	0.00	0.00	0.00
112	Nous Research	DeepHermes 3 - Mistral 24B Preview (Non-reasoning)	38.89	10.89	6.00	1.00	0.00	0.00	0.00	0.00
113	Sarvam	Sarvam 30B (high)	38.89	12.34	6.00	1.00	0.00	0.00	0.00	0.00
114	Sarvam	Sarvam 105B (high)	38.89	18.16	6.00	1.00	0.00	0.00	0.00	0.00
115	Deep Cogito	Cogito v2.1 (Reasoning)	38.89	-	6.00	1.00	0.00	0.00	0.00	0.00
116	AI21 Labs	Jamba Reasoning 3B	38.89	9.60	6.00	1.00	0.00	0.00	0.00	0.00
117	Alibaba	Qwen3.5 4B (Non-reasoning)	38.89	22.60	6.00	1.00	0.00	0.00	0.00	0.00
118	Alibaba	Qwen3.5 0.8B (Non-reasoning)	38.89	9.91	6.00	1.00	0.00	0.00	0.00	0.00
119	Alibaba	Qwen3.5 4B (Reasoning)	38.89	27.08	6.00	1.00	0.00	0.00	0.00	0.00
120	Alibaba	Qwen3.5 9B (Reasoning)	38.89	32.43	6.00	1.00	0.00	0.00	0.00	0.00
121	Alibaba	Qwen3.5 397B A17B (Non-reasoning)	38.89	40.10	6.00	1.00	0.00	0.00	0.00	0.00
122	Alibaba	Qwen3.5 397B A17B (Reasoning)	38.89	45.05	6.00	1.00	0.00	0.00	0.00	0.00
123	Alibaba	Qwen3.5 122B A10B (Reasoning)	38.89	41.60	6.00	1.00	0.00	0.00	0.00	0.00
124	Alibaba	Qwen3.5 35B A3B (Reasoning)	38.89	37.12	6.00	1.00	0.00	0.00	0.00	0.00
125	Alibaba	Qwen3.5 27B (Reasoning)	38.89	42.07	6.00	1.00	0.00	0.00	0.00	0.00
126	Alibaba	Qwen3.5 2B (Reasoning)	38.89	16.29	6.00	1.00	0.00	0.00	0.00	0.00
127	Alibaba	Qwen3.5 0.8B (Reasoning)	38.89	10.52	6.00	1.00	0.00	0.00	0.00	0.00
128	Alibaba	Qwen3.5 27B (Non-reasoning)	38.89	37.18	6.00	1.00	0.00	0.00	0.00	0.00
129	Alibaba	Qwen3.5 122B A10B (Non-reasoning)	38.89	35.87	6.00	1.00	0.00	0.00	0.00	0.00
130	Alibaba	Qwen3.5 35B A3B (Non-reasoning)	38.89	30.69	6.00	1.00	0.00	0.00	0.00	0.00
131	Alibaba	Qwen3.5 9B (Non-reasoning)	38.89	27.33	6.00	1.00	0.00	0.00	0.00	0.00
132	Alibaba	Qwen3.5 2B (Non-reasoning)	38.89	14.67	6.00	1.00	0.00	0.00	0.00	0.00
133	InclusionAI	Ring-1T	38.89	22.78	6.00	1.00	0.00	0.00	0.00	0.00
134	InclusionAI	Ring-flash-2.0	38.89	14.02	6.00	1.00	0.00	0.00	0.00	0.00
135	Mistral	Mistral Small 3.2	38.89	15.07	6.00	1.00	0.00	0.00	0.00	0.00
136	DeepSeek	DeepSeek V3.1 Terminus (Reasoning)	38.89	33.93	6.00	1.00	0.00	0.00	0.00	0.00
137	DeepSeek	DeepSeek V3.1 Terminus (Non-reasoning)	38.89	28.52	6.00	1.00	0.00	0.00	0.00	0.00
138	DeepSeek	DeepSeek R1 Distill Llama 70B	36.11	15.95	4.00	2.50	0.00	0.00	1.00	0.00
139	Liquid AI	LFM2 8B A1B	33.33	7.03	4.00	2.00	0.00	0.00	0.00	0.00
140	Liquid AI	LFM2 2.6B	33.33	8.04	4.00	2.00	0.00	0.00	0.00	0.00
141	Kimi	Kimi K2.5 (Reasoning)	33.33	46.81	4.00	2.00	0.00	0.00	0.00	0.00
142	Nous Research	DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)	33.33	7.58	5.00	1.00	0.00	0.00	0.00	0.00
143	Cohere	Command A	33.33	13.48	3.00	3.00	0.00	0.00	0.00	0.00
144	Liquid AI	LFM2 1.2B	33.33	6.33	4.00	2.00	0.00	0.00	0.00	0.00
145	Naver	HyperCLOVA X SEED Think (32B)	30.56	23.72	4.00	1.50	1.00	0.00	0.00	0.00
146	Meta	Llama 4 Maverick	27.78	18.36	4.00	1.00	0.00	0.00	0.00	0.00
147	Meta	Llama 4 Scout	27.78	13.52	4.00	1.00	0.00	0.00	0.00	0.00
148	Mistral	Magistral Medium 1.2	27.78	27.10	2.00	3.00	0.00	0.00	1.00	1.00
149	Liquid AI	LFM2.5-1.2B-Instruct	27.78	8.04	4.00	1.00	0.00	0.00	0.00	0.00
150	Liquid AI	LFM2 24B A2B	27.78	10.49	4.00	1.00	0.00	0.00	0.00	0.00
151	Liquid AI	LFM2.5-VL-1.6B	27.78	6.18	4.00	1.00	0.00	0.00	0.00	0.00
152	Liquid AI	LFM2.5-1.2B-Thinking	27.78	8.08	4.00	1.00	0.00	0.00	0.00	0.00
153	LG AI Research	K-EXAONE (Reasoning)	27.78	32.12	4.00	1.00	0.00	0.00	0.00	0.00
154	LG AI Research	Exaone 4.0 1.2B (Reasoning)	27.78	8.26	3.00	2.00	0.00	0.00	0.00	0.00
155	LG AI Research	Exaone 4.0 1.2B (Non-reasoning)	27.78	8.11	3.00	2.00	0.00	0.00	0.00	0.00
156	LG AI Research	EXAONE 4.0 32B (Reasoning)	27.78	16.68	3.00	2.00	0.00	0.00	0.00	0.00
157	LG AI Research	EXAONE 4.0 32B (Non-reasoning)	27.78	11.66	3.00	2.00	0.00	0.00	0.00	0.00
158	MiniMax	MiniMax-M2.1	27.78	39.42	4.00	1.00	0.00	0.00	0.00	0.00
159	MiniMax	MiniMax-M2.5	27.78	41.93	4.00	1.00	0.00	0.00	0.00	0.00
160	MiniMax	MiniMax-M2	27.78	36.09	4.00	1.00	0.00	0.00	0.00	0.00
161	Kimi	Kimi K2 0905	27.78	30.85	4.00	1.00	0.00	0.00	0.00	0.00
162	Kimi	Kimi K2 Thinking	27.78	40.89	4.00	1.00	0.00	0.00	0.00	0.00
163	AI21 Labs	Jamba 1.7 Mini	22.22	8.07	4.00	0.00	0.00	0.00	0.00	0.00
164	AI21 Labs	Jamba 1.7 Large	22.22	10.88	4.00	0.00	0.00	0.00	0.00	0.00
165	Alibaba	Qwen3 Max Thinking (Preview)	16.67	32.48	2.00	1.00	0.00	0.00	0.00	0.00
166	Alibaba	Qwen3 Max	16.67	31.38	2.00	1.00	0.00	0.00	0.00	0.00
167	Google	Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)	11.11	19.42	2.00	0.00	0.00	0.00	0.00	0.00
168	Anthropic	Claude 4.5 Haiku (Reasoning)	11.11	37.09	2.00	0.00	0.00	0.00	0.00	0.00
169	Anthropic	Claude 4.5 Haiku (Non-reasoning)	11.11	31.05	2.00	0.00	0.00	0.00	0.00	0.00
170	Mistral	Mistral Medium 3.1	11.11	21.25	2.00	0.00	0.00	0.00	0.00	0.00
171	xAI	Grok 3 mini Reasoning (high)	11.11	32.08	2.00	0.00	0.00	0.00	0.00	0.00
172	Amazon	Nova Micro	11.11	10.27	2.00	0.00	0.00	0.00	0.00	0.00
173	Amazon	Nova Premier	11.11	19.01	2.00	0.00	0.00	0.00	0.00	0.00
174	Upstage	Solar Pro 2 (Non-reasoning)	11.11	13.59	2.00	0.00	0.00	0.00	0.00	0.00
175	Upstage	Solar Pro 2 (Reasoning)	11.11	14.92	2.00	0.00	0.00	0.00	0.00	0.00
176	ByteDance Seed	Doubao Seed Code	11.11	33.52	2.00	0.00	0.00	0.00	0.00	0.00
177	OpenAI	GPT-5.1 (Non-reasoning)	11.11	27.42	2.00	0.00	0.00	0.00	0.00	0.00
178	OpenAI	GPT-5 (ChatGPT)	11.11	21.83	2.00	0.00	0.00	0.00	0.00	0.00
179	Google	Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)	11.11	25.70	2.00	0.00	0.00	0.00	0.00	0.00
180	Anthropic	Claude 4.5 Sonnet (Reasoning)	11.11	43.03	2.00	0.00	0.00	0.00	0.00	0.00
181	Anthropic	Claude Opus 4.5 (Non-reasoning)	11.11	43.09	2.00	0.00	0.00	0.00	0.00	0.00
182	Anthropic	Claude 4.5 Sonnet (Non-reasoning)	11.11	37.14	2.00	0.00	0.00	0.00	0.00	0.00
183	Anthropic	Claude Opus 4.5 (Reasoning)	11.11	49.73	2.00	0.00	0.00	0.00	0.00	0.00
184	Mistral	Devstral Medium	11.11	18.66	2.00	0.00	0.00	0.00	0.00	0.00
185	xAI	Grok 4 Fast (Non-reasoning)	11.11	23.12	2.00	0.00	0.00	0.00	0.00	0.00
186	xAI	Grok 4.1 Fast (Non-reasoning)	11.11	23.56	2.00	0.00	0.00	0.00	0.00	0.00
187	Amazon	Nova Pro	11.11	13.48	2.00	0.00	0.00	0.00	0.00	0.00
188	Amazon	Nova Lite	11.11	12.65	2.00	0.00	0.00	0.00	0.00	0.00
189	OpenAI	o3	5.56	38.37	1.00	0.00	0.00	0.00	0.00	0.00
190	Google	Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)	5.56	21.65	1.00	0.00	0.00	0.00	0.00	0.00
191	Google	Gemini 2.5 Pro	5.56	34.63	1.00	0.00	0.00	0.00	0.00	0.00
192	xAI	Grok Code Fast 1	5.56	28.74	1.00	0.00	0.00	0.00	0.00	0.00
193	OpenAI	GPT-5 (minimal)	5.56	23.89	1.00	0.00	0.00	0.00	0.00	0.00
194	OpenAI	GPT-5 mini (minimal)	5.56	20.68	1.00	0.00	0.00	0.00	0.00	0.00
195	OpenAI	GPT-5 nano (medium)	5.56	25.88	1.00	0.00	0.00	0.00	0.00	0.00
196	OpenAI	GPT-5 nano (high)	5.56	26.83	1.00	0.00	0.00	0.00	0.00	0.00
197	OpenAI	GPT-5 mini (medium)	5.56	38.94	1.00	0.00	0.00	0.00	0.00	0.00
198	OpenAI	GPT-5 (high)	5.56	44.63	1.00	0.00	0.00	0.00	0.00	0.00
199	OpenAI	GPT-5 (medium)	5.56	42.03	1.00	0.00	0.00	0.00	0.00	0.00
200	OpenAI	GPT-5 (low)	5.56	39.20	1.00	0.00	0.00	0.00	0.00	0.00
201	OpenAI	GPT-5 nano (minimal)	5.56	13.84	1.00	0.00	0.00	0.00	0.00	0.00
202	OpenAI	GPT-5.1 (high)	5.56	47.70	1.00	0.00	0.00	0.00	0.00	0.00
203	OpenAI	GPT-5 Codex (high)	5.56	44.63	1.00	0.00	0.00	0.00	0.00	0.00
204	OpenAI	GPT-5 mini (high)	5.56	41.17	1.00	0.00	0.00	0.00	0.00	0.00
205	Google	Gemini 3 Pro Preview (high)	5.56	48.39	1.00	0.00	0.00	0.00	0.00	0.00
206	Google	Gemini 2.5 Flash Preview (Sep '25) (Reasoning)	5.56	31.14	1.00	0.00	0.00	0.00	0.00	0.00
207	xAI	Grok 4 Fast (Reasoning)	5.56	35.06	1.00	0.00	0.00	0.00	0.00	0.00
208	xAI	Grok 4.1 Fast (Reasoning)	5.56	38.61	1.00	0.00	0.00	0.00	0.00	0.00
209	xAI	Grok 4	5.56	41.52	1.00	0.00	0.00	0.00	0.00	0.00

Explore Evaluations

Artificial Analysis Intelligence Index

A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.

GDPval-AA Leaderboard

GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.

𝜏²-Bench Telecom Benchmark Leaderboard

A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.

Terminal-Bench Hard Benchmark Leaderboard

An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.

SciCode Benchmark Leaderboard

A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.

Artificial Analysis Long Context Reasoning Benchmark Leaderboard

A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).

AA-Omniscience: Knowledge and Hallucination Benchmark

A benchmark measuring factual recall and hallucination across various economically relevant domains.

IFBench Benchmark Leaderboard

A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.

Humanity's Last Exam Benchmark Leaderboard

A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.

GPQA Diamond Benchmark Leaderboard

The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.

CritPt Benchmark Leaderboard

A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.

Artificial Analysis Openness Index

A composite measure providing an industry standard to communicate model openness for users and developers.

MMLU-Pro Benchmark Leaderboard

An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.

Global-MMLU-Lite Benchmark Leaderboard

A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.

LiveCodeBench Benchmark Leaderboard

A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.

MATH-500 Benchmark Leaderboard

A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.

AIME 2025 Benchmark Leaderboard

All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.

MMMU-Pro Benchmark Leaderboard

An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.