MiniMax M2.5: Everything You Need to Know

See model page

MiniMax has released MiniMax-M2.5, an incremental upgrade over M2.1, up +2 points in the Artificial Analysis Intelligence Index, supported by a higher GDPval-AA score but the model also has a higher hallucination rate in AA-Omniscience

MiniMax-M2.5 with an Intelligence Index score of 42 sits in line with GLM-4.7 (Reasoning, 42) and DeepSeek V3.2 (Reasoning, 42) for the #3-5 spots amongst open weights models, behind GLM-5 (Reasoning, 50) and Kimi K2.5 (Reasoning, 47). M2.5 is the same size as MiniMax-M2.1 with 229B total / 10B active parameter MoE architecture.

Key takeaways:

➤ Improved agentic performance: M2.5's Agentic Index jumps to 56, up from M2.1's 47, driven by GDPval-AA ELO increasing to 1215 from 1079. GDPval-AA compares model outputs on realistic knowledge work tasks like preparing presentations, analysis, and more using a dedicated terminal environment and web access. The work outputs are then compared to each other and assessed with an automated pipeline to generate matches and ELO scores. This places the model as a #3 open weights model behind GLM-5 and Kimi K2.5

➤ Increased hallucination rate: M2.5's AA-Omniscience Index drops to -41, down from M2.1's -30. M2.1 had improved through increased abstention, reducing its hallucination rate to 67%. M2.5 reverses this trend, with the hallucination rate rising back to 88%, more in line with M2's 89%. Accuracy also improves slightly from 22% (M2.1) to 25% (M2.5), but the increased hallucination offsets this in the AA-Omniscience Index

➤ No material change in token usage: M2.5 used ~56M output tokens to run the Intelligence Index, roughly in line with M2.1's ~58M. For context, GLM-4.7 used ~167M and DeepSeek V3.2 used ~61M output tokens, making M2.5 one of the more token-efficient open weights reasoning models for its intelligence level

Key model details:

➤ Context window: 200k tokens, this is equivalent to MiniMax-M2.1

➤ Size: 229B total / 10B active parameters

➤ Licensing: Modified MIT License (requires attribution for commercial use)

➤ Availability: M2.5 is available via MiniMax first-party API and third-party APIs such as Fireworks, Novita, and GMI Cloud

Intelligence Index

The biggest improvement in MiniMax-M2.5 is in agentic performance. GDPval-AA ELO rises to 1215 from M2.1's 1079, placing it as the #3 open weights model on GDPval-AA behind GLM-5 and Kimi K2.5

GDPval GDPval

MiniMax-M2.5 used ~56M output tokens to run the Intelligence Index, roughly in line with M2.1's ~58M

Output Tokens

MiniMax-M2.5's AA-Omniscience Index regresses to -41, down from M2.1's -30. Accuracy improves marginally (22% to 25%), but the increased hallucination offsets the gain.

Omniscience

Check out additional analysis for this model on X: https://x.com/ArtificialAnlys/status/2022476857896218925?s=20 Explore the full suite of benchmarks at https://artificialanalysis.ai/

MiniMax M2.5: Everything You Need to Know

Read the latest

Claude Opus 5: the new leader in agentic knowledge work

Opus 5: Fable 5 level intelligence at a lower cost per task

How Thinking Machines Lab’s Inkling performs on agentic knowledge work