Recent open weights model launches

All three leading open weights models were released last week. Progress continues for open weights models alongside proprietary ones, with the gap to GPT-5.5, the leading proprietary model, sitting at 6 points on the Artificial Analysis Intelligence Index

Moonshot AI’s Kimi K2.6 (Reasoning) and Xiaomi's MiMo V2.5 Pro (Reasoning) tie as the leading open weights models on the Artificial Analysis Intelligence Index at 54, with DeepSeeks's DeepSeek V4 Pro (Reasoning, Max Effort) at 52. This places the best open weights models within 3-6 points of the leading proprietary models: OpenAI's GPT-5.5 (xhigh) at 60, and Google's Gemini 3.1 Pro Preview and AnthropicAI's Claude Opus 4.7 (Adaptive Reasoning, Max Effort) at 57.

For context: just one year ago the highest-scoring open weights model was DeepSeek V3 0324 which achieved 22 on the Intelligence Index, and was ~13 points below the highest-scoring proprietary model, Claude 3.7 Sonnet (Reasoning) at 35.

Key takeaways:

➤ The top three most intelligent open weights models are trillion-plus-parameter MoE architectures with permissive licenses. Kimi K2.6 (Reasoning) has 1T total / 32B active parameters with 256K context window, MiMo V2.5 Pro (Reasoning) has 1T total / 42B active with 1M context window, and DeepSeek V4 Pro (Reasoning, Max Effort) has 1.6T total / 49B active with 1M context window.

➤ The gap to proprietary remains wide on the hardest reasoning and agentic coding evaluations. On HLE (Humanity's Last Exam) the three top open weights models score 34-36%, vs 44% for GPT-5.5 (xhigh) and 45% for Gemini 3.1 Pro Preview. On CritPt (Research-level Physics) they score 4-12%, vs 27% for GPT-5.5 (xhigh). On TerminalBench Hard (Agentic Coding & Terminal Use) they score 43-46%, vs 61% for GPT-5.5 (xhigh) and 54% for Gemini 3.1 Pro Preview.

➤ Omniscience (knowledge + hallucination) shows a large gap to proprietary models, with DeepSeek V4 Pro (Reasoning, Max Effort) hallucinating significantly more than its open weights peers. DeepSeek V4 Pro (Reasoning, Max Effort) scores -10, MiMo V2.5 Pro (Reasoning) +4, and Kimi K2.6 (Reasoning) +6. By comparison, GPT-5.5 (xhigh) scores +20, Claude Opus 4.7 (Adaptive Reasoning, Max Effort) +26, and Gemini 3.1 Pro Preview +33.

Leading open weights models are from China-based AI labs. The top 10 open weights models on the Intelligence Index are all from China-based AI labs. The 2 highest originating from labs outside of China are Gemma 4 31B (Reasoning) and NVIDIA Nemotron 3 Super (Reasoning).

Open weights dominate the Pareto frontier for Intelligence vs. Price. 9 of the 13 models on the Pareto frontier are open weights models (including MiniMax M2.7). Kimi K2.6 (Reasoning) and MiMo V2.5 Pro (Reasoning) are both on the Pareto frontier, with DeepSeek V4 Pro (Reasoning, Max Effort) just below. These three models offer comparable intelligence to leading proprietary models at between half to one-sixth of the price.

Full results for the Intelligence Index for Kimi K2.6 (Reasoning), MiMo V2.5 Pro (Reasoning), DeepSeek V4 Pro (Reasoning, Max Effort), and leading proprietary models here:

Recent open weights model launches

Key takeaways:

Read the latest

Cursor’s Composer 2.5: third on the Coding Agent Index and ~10-60x lower cost than rivals

Cohere launches open weights model Command A+, more than a year since the Command A release

Gemini 3.5 Flash: The new leader in intelligence versus speed