Artificial Analysis Openness Index
Openness Index
Openness Index
Artificial Analysis Openness Index: Results
Artificial Analysis Openness Index: Components
Artificial Analysis Openness Index: Model Availability vs. Model Transparency
Artificial Analysis Openness Index: Score vs. Release Date
Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index
Openness Index Composition
Detailed methodologyScoring methodology
Each component is scored on a 0-3 qualitative scale based on the best-fitting openness 'archetype', with each model assessed based on the full set of public first-party information available.
We synthesize these underlying factors into a unified metric, the Artificial Analysis Openness Index, as follows:
- Data elements are averaged between pre- and post-training (to give a total of 6 possible points across data)
- All component scores are added (up to a maximum of 18/18 points)
- This score is normalized to a 0-100 scale
Where models are derived from a third-party base model, they may be constrained by the licensing or limited disclosure of the upstream model. For incremental/update releases, we only consider disclosures explicitly about the new release (including allowing model creators to declare which components remain consistent with an earlier release).
Openness Index Leaderboard
| 1 | Olmo 3.1 32B Instruct | 88.89 | 12.16 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 2 | Olmo 3 7B Think | 88.89 | 9.43 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 3 | Olmo 3.1 32B Think | 88.89 | 13.94 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 4 | Molmo 7B-D | 88.89 | 9.25 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 5 | Olmo 3 7B Instruct | 88.89 | 8.15 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 6 | K2 Think V2 | 88.89 | 24.12 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 7 | K2-V2 (medium) | 88.89 | 18.68 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 8 | K2-V2 (low) | 88.89 | 14.44 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 9 | K2-V2 (high) | 88.89 | 20.61 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 10 | Apertus 8B Instruct | 88.89 | 5.88 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 11 | Apertus 70B Instruct | 88.89 | 7.70 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 12 | Olmo 3 32B Think | 88.89 | 12.09 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 13 | OLMo 2 7B | 88.89 | 9.30 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 14 | OLMo 2 32B | 88.89 | 10.57 | 6.00 | 10.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 15 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 83.33 | 35.97 | 6.00 | 9.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 16 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 83.33 | 24.27 | 6.00 | 9.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 17 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 72.22 | 10.09 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 18 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 72.22 | 13.16 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 19 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | 72.22 | 14.76 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 20 | NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 72.22 | 14.89 | 6.00 | 7.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 21 | Molmo2-8B | 72.22 | 7.30 | 6.00 | 7.00 | 3.00 | 1.00 | 3.00 | 1.00 | |
| 22 | Kimi Linear 48B A3B Instruct | 61.11 | 14.41 | 6.00 | 5.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 23 | Granite 4.0 1B | 55.56 | 7.34 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 24 | Granite 4.0 H 1B | 55.56 | 7.99 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 25 | Granite 4.0 350M | 55.56 | 6.10 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 26 | Granite 4.0 H Small | 55.56 | 10.81 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 27 | Granite 4.0 Micro | 55.56 | 7.67 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 28 | Granite 4.0 H 350M | 55.56 | 5.44 | 6.00 | 4.00 | 2.00 | 1.00 | 2.00 | 1.00 | |
| 29 | ERNIE 4.5 300B A47B | 55.56 | 14.96 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 30 | GLM-4.5 (Reasoning) | 55.56 | 26.42 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 31 | GLM-4.5-Air | 55.56 | 23.17 | 6.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 32 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | 52.78 | 14.59 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 33 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 52.78 | 18.49 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 34 | Llama Nemotron Super 49B v1.5 (Reasoning) | 52.78 | 18.68 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 35 | Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 52.78 | 14.35 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 36 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 52.78 | 15.02 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 37 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 52.78 | 14.43 | 4.00 | 5.50 | 1.00 | 0.00 | 1.00 | 1.00 | |
| 38 | MiMo-V2-Flash (Reasoning) | 52.78 | 39.24 | 6.00 | 3.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 39 | GLM-4.5V (Reasoning) | 52.78 | 15.09 | 6.00 | 3.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 40 | GLM-4.5V (Non-reasoning) | 52.78 | 12.74 | 6.00 | 3.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 41 | Magistral Small 1.2 | 50.00 | 18.16 | 6.00 | 3.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 42 | DeepSeek R1 0528 (May '25) | 50.00 | 27.07 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 43 | Phi-4 | 50.00 | 10.41 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 44 | Phi-4 Mini Instruct | 50.00 | 8.39 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 45 | Phi-4 Multimodal Instruct | 50.00 | 10.04 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 46 | Step 3.5 Flash | 50.00 | 37.80 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 47 | GLM-5 (Reasoning) | 50.00 | 49.77 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 48 | Gemma 3n E2B Instruct | 50.00 | 4.76 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 49 | Gemma 3 1B Instruct | 50.00 | 5.55 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 50 | Gemma 3 4B Instruct | 50.00 | 6.30 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 51 | Gemma 3 27B Instruct | 50.00 | 10.31 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 52 | Gemma 3 12B Instruct | 50.00 | 8.79 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 53 | Gemma 3n E4B Instruct | 50.00 | 6.38 | 6.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 54 | Qwen3 VL 30B A3B Instruct | 50.00 | 16.05 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 55 | Qwen3 VL 32B Instruct | 50.00 | 17.19 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 56 | Qwen3 VL 235B A22B Instruct | 50.00 | 20.75 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 57 | Qwen3 VL 235B A22B (Reasoning) | 50.00 | 27.64 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 58 | Qwen3 VL 30B A3B (Reasoning) | 50.00 | 19.68 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 59 | Qwen3 VL 8B Instruct | 50.00 | 14.30 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 60 | Qwen3 VL 4B Instruct | 50.00 | 9.55 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 61 | Qwen3 VL 4B (Reasoning) | 50.00 | 13.73 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 62 | Qwen3 VL 8B (Reasoning) | 50.00 | 16.66 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 63 | Qwen3 VL 32B (Reasoning) | 50.00 | 24.72 | 6.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 64 | DeepSeek R1 0528 Qwen3 8B | 47.22 | 16.43 | 6.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 65 | Hermes 4 - Llama-3.1 70B (Reasoning) | 47.22 | 15.99 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 66 | Hermes 4 - Llama-3.1 405B (Reasoning) | 47.22 | 18.56 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 67 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | 47.22 | 12.63 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 68 | Hermes 4 - Llama-3.1 405B (Non-reasoning) | 47.22 | 17.63 | 4.00 | 4.50 | 1.00 | 0.00 | 2.00 | 0.00 | |
| 69 | Apriel-v1.5-15B-Thinker | 47.22 | 28.33 | 6.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 70 | Gemma 3 270M | 44.44 | 7.71 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 71 | Falcon-H1R-7B | 44.44 | 15.80 | 4.00 | 4.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 72 | Llama 3.1 Nemotron Instruct 70B | 44.44 | 13.44 | 4.00 | 4.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 73 | LongCat Flash Lite | 44.44 | 23.93 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 74 | Trinity Large Thinking | 44.44 | 31.87 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 75 | GLM-5.1 (Reasoning) | 44.44 | 51.41 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 76 | Qwen3 Next 80B A3B Instruct | 44.44 | 20.11 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 77 | Qwen3 Next 80B A3B (Reasoning) | 44.44 | 26.72 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 78 | Qwen3 Coder 480B A35B Instruct | 44.44 | 24.77 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 79 | Qwen3 Omni 30B A3B Instruct | 44.44 | 10.68 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 80 | Qwen3 Omni 30B A3B (Reasoning) | 44.44 | 15.62 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 81 | Ling-mini-2.0 | 44.44 | 9.19 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 82 | Ling-1T | 44.44 | 19.04 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 83 | Ling-flash-2.0 | 44.44 | 15.74 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 84 | Devstral Small (Jul '25) | 44.44 | 15.21 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 85 | DeepSeek V3.2 Exp (Reasoning) | 44.44 | 32.94 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 86 | DeepSeek V3.2 Exp (Non-reasoning) | 44.44 | 28.44 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 87 | Kimi K2 | 44.44 | 26.32 | 4.00 | 4.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 88 | GLM-4.7 (Reasoning) | 44.44 | 42.11 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 89 | GLM-4.6 (Non-reasoning) | 44.44 | 30.24 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 90 | GLM-4.7 (Non-reasoning) | 44.44 | 34.16 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 91 | GLM-4.6 (Reasoning) | 44.44 | 32.51 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 92 | GLM-4.7-Flash (Reasoning) | 44.44 | 30.15 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 93 | GLM-4.7-Flash (Non-reasoning) | 44.44 | 22.07 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 94 | Qwen3 235B A22B 2507 Instruct | 44.44 | 24.96 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 95 | Qwen3 30B A3B 2507 (Reasoning) | 44.44 | 22.41 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 96 | Qwen3 30B A3B 2507 Instruct | 44.44 | 15.00 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 97 | Qwen3 235B A22B 2507 (Reasoning) | 44.44 | 29.54 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 98 | Qwen3 4B 2507 (Reasoning) | 44.44 | 18.18 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 99 | Qwen3 Coder 30B A3B Instruct | 44.44 | 19.98 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 100 | Qwen3 4B 2507 Instruct | 44.44 | 12.88 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 101 | Seed-OSS-36B-Instruct | 44.44 | 25.16 | 6.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 102 | Qwen3 Coder Next | 41.67 | 28.28 | 6.00 | 1.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 103 | gpt-oss-20B (high) | 38.89 | 24.47 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 104 | gpt-oss-120B (high) | 38.89 | 33.27 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 105 | Llama 3.3 Instruct 70B | 38.89 | 14.49 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 106 | Llama 3.1 Instruct 405B | 38.89 | 17.38 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 107 | Llama 3.2 Instruct 90B (Vision) | 38.89 | 11.90 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 108 | Llama 3.2 Instruct 11B (Vision) | 38.89 | 8.73 | 4.00 | 3.00 | 1.00 | 0.00 | 1.00 | 0.00 | |
| 109 | Gemma 4 31B (Reasoning) | 38.89 | 39.18 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 110 | Gemma 4 26B A4B (Reasoning) | 38.89 | 31.21 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 111 | Gemma 4 E4B (Reasoning) | 38.89 | 18.76 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 112 | Gemma 4 E2B (Reasoning) | 38.89 | 15.21 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 113 | Mistral Small 4 (Reasoning) | 38.89 | 27.80 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 114 | Mistral Small 4 (Non-reasoning) | 38.89 | 18.62 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 115 | Mistral Large 3 | 38.89 | 22.80 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 116 | R1 1776 | 38.89 | 11.99 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 117 | Reka Flash 3 | 38.89 | 9.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 118 | DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 38.89 | 10.89 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 119 | Sarvam 30B (high) | 38.89 | 12.34 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 120 | Sarvam 105B (high) | 38.89 | 18.16 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 121 | Cogito v2.1 (Reasoning) | 38.89 | - | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 122 | Jamba Reasoning 3B | 38.89 | 9.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 123 | Qwen3.5 27B (Reasoning) | 38.89 | 42.07 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 124 | Qwen3.5 35B A3B (Non-reasoning) | 38.89 | 30.69 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 125 | Qwen3.5 27B (Non-reasoning) | 38.89 | 37.18 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 126 | Qwen3.5 35B A3B (Reasoning) | 38.89 | 37.12 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 127 | Qwen3.5 397B A17B (Non-reasoning) | 38.89 | 40.10 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 128 | Qwen3.5 397B A17B (Reasoning) | 38.89 | 45.05 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 129 | Qwen3.5 122B A10B (Non-reasoning) | 38.89 | 35.87 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 130 | Qwen3.5 0.8B (Reasoning) | 38.89 | 10.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 131 | Qwen3.5 4B (Reasoning) | 38.89 | 27.08 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 132 | Qwen3.5 2B (Reasoning) | 38.89 | 16.29 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 133 | Qwen3.5 0.8B (Non-reasoning) | 38.89 | 9.91 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 134 | Qwen3.5 4B (Non-reasoning) | 38.89 | 22.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 135 | Qwen3.5 9B (Reasoning) | 38.89 | 32.43 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 136 | Qwen3.5 9B (Non-reasoning) | 38.89 | 27.33 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 137 | Qwen3.5 2B (Non-reasoning) | 38.89 | 14.67 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 138 | Qwen3.5 122B A10B (Reasoning) | 38.89 | 41.60 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 139 | Ring-1T | 38.89 | 22.78 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 140 | Ring-flash-2.0 | 38.89 | 14.02 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 141 | Mistral Small 3.2 | 38.89 | 15.07 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 142 | DeepSeek V3.1 Terminus (Non-reasoning) | 38.89 | 28.52 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 143 | DeepSeek V3.1 Terminus (Reasoning) | 38.89 | 33.93 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 144 | DeepSeek R1 Distill Llama 70B | 36.11 | 15.95 | 4.00 | 2.50 | 0.00 | 0.00 | 1.00 | 0.00 | |
| 145 | LFM2 2.6B | 33.33 | 8.04 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 146 | LFM2 8B A1B | 33.33 | 7.03 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 147 | Kimi K2.5 (Reasoning) | 33.33 | 46.81 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 148 | DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 33.33 | 7.58 | 5.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 149 | Command A | 33.33 | 13.48 | 3.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 150 | LFM2 1.2B | 33.33 | 6.33 | 4.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 151 | HyperCLOVA X SEED Think (32B) | 30.56 | 23.72 | 4.00 | 1.50 | 1.00 | 0.00 | 0.00 | 0.00 | |
| 152 | Llama 4 Scout | 27.78 | 13.52 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 153 | Llama 4 Maverick | 27.78 | 18.36 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 154 | Magistral Medium 1.2 | 27.78 | 27.10 | 2.00 | 3.00 | 0.00 | 0.00 | 1.00 | 1.00 | |
| 155 | LFM2 24B A2B | 27.78 | 10.49 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 156 | LFM2.5-VL-1.6B | 27.78 | 6.18 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 157 | LFM2.5-1.2B-Thinking | 27.78 | 8.08 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 158 | LFM2.5-1.2B-Instruct | 27.78 | 8.04 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 159 | Exaone 4.0 1.2B (Reasoning) | 27.78 | 8.26 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 160 | K-EXAONE (Reasoning) | 27.78 | 32.12 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 161 | Exaone 4.0 1.2B (Non-reasoning) | 27.78 | 8.11 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 162 | EXAONE 4.0 32B (Non-reasoning) | 27.78 | 11.66 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 163 | EXAONE 4.0 32B (Reasoning) | 27.78 | 16.68 | 3.00 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 164 | MiniMax-M2.5 | 27.78 | 41.93 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 165 | MiniMax-M2 | 27.78 | 36.09 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 166 | MiniMax-M2.1 | 27.78 | 39.42 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 167 | Kimi K2 Thinking | 27.78 | 40.89 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 168 | Kimi K2 0905 | 27.78 | 30.85 | 4.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 169 | MiniMax-M2.7 | 22.22 | 49.62 | 3.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 170 | Jamba 1.7 Large | 22.22 | 10.88 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 171 | Jamba 1.7 Mini | 22.22 | 8.07 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 172 | Qwen3 Max | 16.67 | 31.38 | 2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 173 | Qwen3 Max Thinking (Preview) | 16.67 | 32.48 | 2.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 174 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | 11.11 | 19.42 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 175 | Claude 4.5 Haiku (Reasoning) | 11.11 | 37.09 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 176 | Claude 4.5 Haiku (Non-reasoning) | 11.11 | 31.05 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 177 | Mistral Medium 3.1 | 11.11 | 21.25 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 178 | Grok 3 mini Reasoning (high) | 11.11 | 32.08 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 179 | Grok 4.1 Fast (Non-reasoning) | 11.11 | 23.56 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 180 | Nova Micro | 11.11 | 10.27 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 181 | Nova Premier | 11.11 | 19.01 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 182 | Solar Pro 2 (Non-reasoning) | 11.11 | 13.59 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 183 | Solar Pro 2 (Reasoning) | 11.11 | 14.92 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 184 | Doubao Seed Code | 11.11 | 33.52 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 185 | GPT-5.1 (Non-reasoning) | 11.11 | 27.42 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 186 | GPT-5 (ChatGPT) | 11.11 | 21.83 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 187 | Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | 11.11 | 25.70 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 188 | Claude Opus 4.5 (Non-reasoning) | 11.11 | 43.09 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 189 | Claude Opus 4.5 (Reasoning) | 11.11 | 49.73 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 190 | Claude 4.5 Sonnet (Reasoning) | 11.11 | 43.03 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 191 | Claude 4.5 Sonnet (Non-reasoning) | 11.11 | 37.14 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 192 | Devstral Medium | 11.11 | 18.66 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 193 | Grok 4 Fast (Non-reasoning) | 11.11 | 23.12 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 194 | Nova Pro | 11.11 | 13.48 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 195 | Nova Lite | 11.11 | 12.65 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 196 | o3 | 5.56 | 38.37 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 197 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 5.56 | 21.65 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 198 | Gemini 2.5 Pro | 5.56 | 34.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 199 | Grok Code Fast 1 | 5.56 | 28.74 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 200 | Grok 4.1 Fast (Reasoning) | 5.56 | 38.61 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 201 | GPT-5 (high) | 5.56 | 44.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 202 | GPT-5 Codex (high) | 5.56 | 44.63 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 203 | GPT-5 (medium) | 5.56 | 42.03 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 204 | GPT-5 (low) | 5.56 | 39.20 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 205 | GPT-5 mini (minimal) | 5.56 | 20.68 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 206 | GPT-5 nano (medium) | 5.56 | 25.88 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 207 | GPT-5 mini (medium) | 5.56 | 38.94 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 208 | GPT-5 mini (high) | 5.56 | 41.17 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 209 | GPT-5 nano (minimal) | 5.56 | 13.84 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 210 | GPT-5 (minimal) | 5.56 | 23.89 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 211 | GPT-5 nano (high) | 5.56 | 26.83 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 212 | GPT-5.1 (high) | 5.56 | 47.70 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 213 | Gemini 3 Pro Preview (high) | 5.56 | 48.39 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 214 | Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | 5.56 | 31.14 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 215 | Grok 4 Fast (Reasoning) | 5.56 | 35.06 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 216 | Grok 4 | 5.56 | 41.52 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Explore Evaluations
A composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.
GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.
Artificial Analysis' implementation of the APEX-Agents benchmark, testing AI agents on long-horizon, cross-application tasks in professional-services environments with realistic application tooling.
A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.
An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.
A scientist-curated coding benchmark featuring 288 test set subproblems from 80 laboratory problems across 16 scientific disciplines.
A challenging benchmark measuring language models' ability to extract, reason about, and synthesize information from long-form documents ranging from 10k to 100k tokens (measured using the cl100k_base tokenizer).
A benchmark measuring factual recall and hallucination across various economically relevant domains.
A benchmark evaluating precise instruction-following generalization on 58 diverse, verifiable out-of-domain constraints that test models' ability to follow specific output requirements.
A frontier-level benchmark with 2,500 expert-vetted questions across mathematics, sciences, and humanities, designed to be the final closed-ended academic evaluation.
The most challenging 198 questions from GPQA, where PhD experts achieve 65% accuracy but skilled non-experts only reach 34% despite web access.
A benchmark designed to test LLMs on research-level physics reasoning tasks, featuring 71 composite research challenges.
A composite measure providing an industry standard to communicate model openness for users and developers.
An enhanced version of MMLU with 12,000 graduate-level questions across 14 subject areas, featuring ten answer options and deeper reasoning requirements.
A lightweight, multilingual version of MMLU, designed to evaluate knowledge and reasoning skills across a diverse range of languages and cultural contexts.
A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution.
A 500-problem subset from the MATH dataset, featuring competition-level mathematics across six domains including algebra, geometry, and number theory.
All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning with integer answers from 000-999.
An enhanced MMMU benchmark that eliminates shortcuts and guessing strategies to more rigorously test multimodal models across 30 academic disciplines.