Artificial Analysis Openness Index
Background
Methodology
Highlights
- Olmo 3.1 32B Instruct scores the highest on Openness Index with a score of 89, followed by Olmo 3 7B Think with a score of 89, and Olmo 3 7B Instruct with a score of 89
- o3 scores the lowest on Openness Index with a score of 6, followed by Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) with a score of 6, and Gemini 2.5 Pro with a score of 6
Artificial Analysis Openness Index: Results
Artificial Analysis Openness Index: Components
Artificial Analysis Openness Index: Model Availability vs. Model Transparency
Artificial Analysis Openness Index: Score vs. Release Date
Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index
Openness Index Composition
Detailed methodologyScoring methodology
Each component is scored on a 0-3 qualitative scale based on the best-fitting openness 'archetype', with each model assessed based on the full set of public first-party information available.
We synthesize these underlying factors into a unified metric, the Artificial Analysis Openness Index, as follows:
- Data elements are averaged between pre- and post-training (to give a total of 6 possible points across data)
- All component scores are added (up to a maximum of 18/18 points)
- This score is normalized to a 0-100 scale
Where models are derived from a third-party base model, they may be constrained by the licensing or limited disclosure of the upstream model. For incremental/update releases, we only consider disclosures explicitly about the new release (including allowing model creators to declare which components remain consistent with an earlier release).
Openness Index Leaderboard
| 1 | Olmo 3.1 32B Instruct | 88.89 | 12.16 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 2 | Olmo 3 7B Think | 88.89 | 9.43 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 3 | Olmo 3 7B Instruct | 88.89 | 8.15 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 4 | Molmo 7B-D | 88.89 | 9.25 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 5 | Olmo 3.1 32B Think | 88.89 | 13.94 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 6 | K2-V2 (high) | 88.89 | 20.61 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 7 | K2-V2 (low) | 88.89 | 14.44 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 8 | K2 Think V2 | 88.89 | 24.12 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 9 | K2-V2 (medium) | 88.89 | 18.68 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 10 | Apertus 70B Instruct | 88.89 | 7.70 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 11 | Apertus 8B Instruct | 88.89 | 5.88 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 12 | Olmo 3 32B Think | 88.89 | 12.09 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 13 | OLMo 2 7B | 88.89 | 9.30 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 14 | OLMo 2 32B | 88.89 | 10.57 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 15 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 83.33 | 35.97 | 6.00 | 9.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 16 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 83.33 | 24.27 | 6.00 | 9.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 17 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 72.22 | 13.16 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 18 | NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 72.22 | 14.89 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 19 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 72.22 | 10.09 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 20 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | 72.22 | 14.76 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 21 | Molmo2-8B | 72.22 | 7.30 | 6.00 | 7.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 22 | Kimi Linear 48B A3B Instruct | 61.11 | 14.41 | 6.00 | 5.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 23 | Granite 4.0 H 1B | 55.56 | 7.99 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 24 | Granite 4.0 1B | 55.56 | 7.34 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 25 | Granite 4.0 Micro | 55.56 | 7.67 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 26 | Granite 4.0 H 350M | 55.56 | 5.44 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 27 | Granite 4.0 350M | 55.56 | 6.10 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 28 | Granite 4.0 H Small | 55.56 | 10.81 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 29 | ERNIE 4.5 300B A47B | 55.56 | 14.96 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 30 | GLM-4.5-Air | 55.56 | 23.17 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 31 | GLM-4.5 (Reasoning) | 55.56 | 26.42 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 32 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 52.78 | 14.43 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 33 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | 52.78 | 14.59 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 34 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 52.78 | 15.02 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 35 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 52.78 | 18.49 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 36 | Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 52.78 | 14.35 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 37 | Llama Nemotron Super 49B v1.5 (Reasoning) | 52.78 | 18.68 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 38 | MiMo-V2-Flash (Reasoning) | 52.78 | 39.24 | 6.00 | 3.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 39 | GLM-4.5V (Non-reasoning) | 52.78 | 12.74 | 6.00 | 3.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 40 | GLM-4.5V (Reasoning) | 52.78 | 15.09 | 6.00 | 3.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 41 | Gemma 3n E4B Instruct | 50.00 | 6.38 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 42 | Gemma 3 12B Instruct | 50.00 | 8.79 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 43 | Gemma 3 4B Instruct | 50.00 | 6.30 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 44 | Gemma 3 27B Instruct | 50.00 | 10.31 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 45 | Gemma 3 1B Instruct | 50.00 | 5.55 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 46 | Gemma 3n E2B Instruct | 50.00 | 4.76 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 47 | Magistral Small 1.2 | 50.00 | 18.16 | 6.00 | 3.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 48 | DeepSeek R1 0528 (May '25) | 50.00 | 27.07 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 49 | Phi-4 | 50.00 | 10.41 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 50 | Phi-4 Mini Instruct | 50.00 | 8.39 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 51 | Phi-4 Multimodal Instruct | 50.00 | 10.04 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 52 | Step 3.5 Flash | 50.00 | 37.80 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 53 | GLM-5 (Reasoning) | 50.00 | 49.77 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 54 | Qwen3 VL 30B A3B Instruct | 50.00 | 16.05 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 55 | Qwen3 VL 30B A3B (Reasoning) | 50.00 | 19.68 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 56 | Qwen3 VL 32B (Reasoning) | 50.00 | 24.72 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 57 | Qwen3 VL 32B Instruct | 50.00 | 17.19 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 58 | Qwen3 VL 235B A22B (Reasoning) | 50.00 | 27.64 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 59 | Qwen3 VL 8B (Reasoning) | 50.00 | 16.66 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 60 | Qwen3 VL 235B A22B Instruct | 50.00 | 20.75 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 61 | Qwen3 VL 4B Instruct | 50.00 | 9.55 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 62 | Qwen3 VL 4B (Reasoning) | 50.00 | 13.73 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 63 | Qwen3 VL 8B Instruct | 50.00 | 14.30 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 64 | DeepSeek R1 0528 Qwen3 8B | 47.22 | 16.43 | 6.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 65 | Hermes 4 - Llama-3.1 70B (Reasoning) | 47.22 | 15.99 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 66 | Hermes 4 - Llama-3.1 405B (Reasoning) | 47.22 | 18.56 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 67 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | 47.22 | 12.63 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 68 | Hermes 4 - Llama-3.1 405B (Non-reasoning) | 47.22 | 17.63 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 69 | Apriel-v1.5-15B-Thinker | 47.22 | 28.33 | 6.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 70 | Gemma 3 270M | 44.44 | 7.71 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 71 | Falcon-H1R-7B | 44.44 | 15.80 | 4.00 | 4.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 72 | Llama 3.1 Nemotron Instruct 70B | 44.44 | 13.44 | 4.00 | 4.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 73 | LongCat Flash Lite | 44.44 | 23.93 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 74 | Qwen3 Next 80B A3B (Reasoning) | 44.44 | 26.72 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 75 | Qwen3 Coder 480B A35B Instruct | 44.44 | 24.77 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 76 | Qwen3 Next 80B A3B Instruct | 44.44 | 20.11 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 77 | Qwen3 Omni 30B A3B Instruct | 44.44 | 10.68 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 78 | Qwen3 Omni 30B A3B (Reasoning) | 44.44 | 15.62 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 79 | Ling-flash-2.0 | 44.44 | 15.74 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 80 | Ling-mini-2.0 | 44.44 | 9.19 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 81 | Ling-1T | 44.44 | 19.04 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 82 | Devstral Small (Jul '25) | 44.44 | 15.21 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 83 | DeepSeek V3.2 Exp (Reasoning) | 44.44 | 32.94 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 84 | DeepSeek V3.2 Exp (Non-reasoning) | 44.44 | 28.44 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 85 | Kimi K2 | 44.44 | 26.32 | 4.00 | 4.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 86 | GLM-4.7 (Reasoning) | 44.44 | 42.11 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 87 | GLM-4.7 (Non-reasoning) | 44.44 | 34.16 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 88 | GLM-4.6 (Reasoning) | 44.44 | 32.51 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 89 | GLM-4.7-Flash (Reasoning) | 44.44 | 30.15 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 90 | GLM-4.7-Flash (Non-reasoning) | 44.44 | 22.07 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 91 | GLM-4.6 (Non-reasoning) | 44.44 | 30.24 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 92 | Qwen3 Coder 30B A3B Instruct | 44.44 | 19.98 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 93 | Qwen3 30B A3B 2507 Instruct | 44.44 | 15.00 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 94 | Qwen3 30B A3B 2507 (Reasoning) | 44.44 | 22.41 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 95 | Qwen3 235B A22B 2507 (Reasoning) | 44.44 | 29.54 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 96 | Qwen3 235B A22B 2507 Instruct | 44.44 | 24.96 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 97 | Qwen3 4B 2507 (Reasoning) | 44.44 | 18.18 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 98 | Qwen3 4B 2507 Instruct | 44.44 | 12.88 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 99 | Seed-OSS-36B-Instruct | 44.44 | 25.16 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 100 | Qwen3 Coder Next | 41.67 | 28.28 | 6.00 | 1.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 101 | gpt-oss-120B (high) | 38.89 | 33.27 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 102 | gpt-oss-20B (high) | 38.89 | 24.47 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 103 | Llama 3.3 Instruct 70B | 38.89 | 14.49 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 104 | Llama 3.1 Instruct 405B | 38.89 | 17.38 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 105 | Llama 3.2 Instruct 90B (Vision) | 38.89 | 11.90 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 106 | Llama 3.2 Instruct 11B (Vision) | 38.89 | 8.73 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 107 | Mistral Small 4 (Non-reasoning) | 38.89 | 18.62 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 108 | Mistral Large 3 | 38.89 | 22.80 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 109 | Mistral Small 4 (Reasoning) | 38.89 | 27.19 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 110 | R1 1776 | 38.89 | 11.99 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 111 | Reka Flash 3 | 38.89 | 9.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 112 | DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 38.89 | 10.89 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 113 | Sarvam 30B (high) | 38.89 | 12.34 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 114 | Sarvam 105B (high) | 38.89 | 18.16 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 115 | Cogito v2.1 (Reasoning) | 38.89 | - | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 116 | Jamba Reasoning 3B | 38.89 | 9.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 117 | Qwen3.5 4B (Non-reasoning) | 38.89 | 22.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 118 | Qwen3.5 0.8B (Non-reasoning) | 38.89 | 9.91 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 119 | Qwen3.5 4B (Reasoning) | 38.89 | 27.08 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 120 | Qwen3.5 9B (Reasoning) | 38.89 | 32.43 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 121 | Qwen3.5 397B A17B (Non-reasoning) | 38.89 | 40.10 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 122 | Qwen3.5 397B A17B (Reasoning) | 38.89 | 45.05 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 123 | Qwen3.5 122B A10B (Reasoning) | 38.89 | 41.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 124 | Qwen3.5 35B A3B (Reasoning) | 38.89 | 37.12 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 125 | Qwen3.5 27B (Reasoning) | 38.89 | 42.07 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 126 | Qwen3.5 2B (Reasoning) | 38.89 | 16.29 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 127 | Qwen3.5 0.8B (Reasoning) | 38.89 | 10.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 128 | Qwen3.5 27B (Non-reasoning) | 38.89 | 37.18 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 129 | Qwen3.5 122B A10B (Non-reasoning) | 38.89 | 35.87 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 130 | Qwen3.5 35B A3B (Non-reasoning) | 38.89 | 30.69 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 131 | Qwen3.5 9B (Non-reasoning) | 38.89 | 27.33 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 132 | Qwen3.5 2B (Non-reasoning) | 38.89 | 14.67 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 133 | Ring-1T | 38.89 | 22.78 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 134 | Ring-flash-2.0 | 38.89 | 14.02 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 135 | Mistral Small 3.2 | 38.89 | 15.07 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 136 | DeepSeek V3.1 Terminus (Reasoning) | 38.89 | 33.93 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 137 | DeepSeek V3.1 Terminus (Non-reasoning) | 38.89 | 28.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 138 | DeepSeek R1 Distill Llama 70B | 36.11 | 15.95 | 4.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 139 | LFM2 8B A1B | 33.33 | 7.03 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 140 | LFM2 2.6B | 33.33 | 8.04 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 141 | Kimi K2.5 (Reasoning) | 33.33 | 46.81 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 142 | DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 33.33 | 7.58 | 5.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 143 | Command A | 33.33 | 13.48 | 3.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 144 | LFM2 1.2B | 33.33 | 6.33 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 145 | HyperCLOVA X SEED Think (32B) | 30.56 | 23.72 | 4.00 | 1.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 146 | Llama 4 Maverick | 27.78 | 18.36 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 147 | Llama 4 Scout | 27.78 | 13.52 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 148 | Magistral Medium 1.2 | 27.78 | 27.10 | 2.00 | 3.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 149 | LFM2.5-1.2B-Instruct | 27.78 | 8.04 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 150 | LFM2 24B A2B | 27.78 | 10.49 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 151 | LFM2.5-VL-1.6B | 27.78 | 6.18 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 152 | LFM2.5-1.2B-Thinking | 27.78 | 8.08 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 153 | K-EXAONE (Reasoning) | 27.78 | 32.12 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 154 | Exaone 4.0 1.2B (Reasoning) | 27.78 | 8.26 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 155 | Exaone 4.0 1.2B (Non-reasoning) | 27.78 | 8.11 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 156 | EXAONE 4.0 32B (Reasoning) | 27.78 | 16.68 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 157 | EXAONE 4.0 32B (Non-reasoning) | 27.78 | 11.66 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 158 | MiniMax-M2.1 | 27.78 | 39.42 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 159 | MiniMax-M2.5 | 27.78 | 41.93 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 160 | MiniMax-M2 | 27.78 | 36.09 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 161 | Kimi K2 0905 | 27.78 | 30.85 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 162 | Kimi K2 Thinking | 27.78 | 40.89 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 163 | Jamba 1.7 Mini | 22.22 | 8.07 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 164 | Jamba 1.7 Large | 22.22 | 10.88 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 165 | Qwen3 Max Thinking (Preview) | 16.67 | 32.48 | 2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 166 | Qwen3 Max | 16.67 | 31.38 | 2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 167 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | 11.11 | 19.42 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 168 | Claude 4.5 Haiku (Reasoning) | 11.11 | 37.09 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 169 | Claude 4.5 Haiku (Non-reasoning) | 11.11 | 31.05 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 170 | Mistral Medium 3.1 | 11.11 | 21.25 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 171 | Grok 3 mini Reasoning (high) | 11.11 | 32.08 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 172 | Nova Micro | 11.11 | 10.27 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 173 | Nova Premier | 11.11 | 19.01 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 174 | Solar Pro 2 (Non-reasoning) | 11.11 | 13.59 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 175 | Solar Pro 2 (Reasoning) | 11.11 | 14.92 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 176 | Doubao Seed Code | 11.11 | 33.52 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 177 | GPT-5.1 (Non-reasoning) | 11.11 | 27.42 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 178 | GPT-5 (ChatGPT) | 11.11 | 21.83 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 179 | Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | 11.11 | 25.70 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 180 | Claude 4.5 Sonnet (Reasoning) | 11.11 | 43.03 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 181 | Claude Opus 4.5 (Non-reasoning) | 11.11 | 43.09 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 182 | Claude 4.5 Sonnet (Non-reasoning) | 11.11 | 37.14 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 183 | Claude Opus 4.5 (Reasoning) | 11.11 | 49.73 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 184 | Devstral Medium | 11.11 | 18.66 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 185 | Grok 4 Fast (Non-reasoning) | 11.11 | 23.12 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 186 | Grok 4.1 Fast (Non-reasoning) | 11.11 | 23.56 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 187 | Nova Pro | 11.11 | 13.48 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 188 | Nova Lite | 11.11 | 12.65 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 189 | o3 | 5.56 | 38.37 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 190 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 5.56 | 21.65 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 191 | Gemini 2.5 Pro | 5.56 | 34.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 192 | Grok Code Fast 1 | 5.56 | 28.74 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 193 | GPT-5 (minimal) | 5.56 | 23.89 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 194 | GPT-5 mini (minimal) | 5.56 | 20.68 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 195 | GPT-5 nano (medium) | 5.56 | 25.88 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 196 | GPT-5 nano (high) | 5.56 | 26.83 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 197 | GPT-5 mini (medium) | 5.56 | 38.94 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 198 | GPT-5 (high) | 5.56 | 44.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 199 | GPT-5 (medium) | 5.56 | 42.03 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 200 | GPT-5 (low) | 5.56 | 39.20 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 201 | GPT-5 nano (minimal) | 5.56 | 13.84 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 202 | GPT-5.1 (high) | 5.56 | 47.70 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 203 | GPT-5 Codex (high) | 5.56 | 44.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 204 | GPT-5 mini (high) | 5.56 | 41.17 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 205 | Gemini 3 Pro Preview (high) | 5.56 | 48.39 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 206 | Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | 5.56 | 31.14 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 207 | Grok 4 Fast (Reasoning) | 5.56 | 35.06 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 208 | Grok 4.1 Fast (Reasoning) | 5.56 | 38.61 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 209 | Grok 4 | 5.56 | 41.52 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Explore Evaluations
A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.
GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.
A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.
An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.
A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.
A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).
A benchmark measuring factual recall and hallucination across various economically relevant domains.
A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.
A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.
The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.
A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.
A composite measure providing an industry standard to communicate model openness for users and developers.
An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.
A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.
A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.
A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.
All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.
An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.