DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash

DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index

DeepSeek has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture from DeepSeek since V3. V4 introduces a new architecture with V4 Pro at 1.6T total / 49B active parameters and V4 Flash at 284B total / 13B active parameters, and is DeepSeek's first two-tier lineup, with Pro positioned for maximum capability and Flash for faster, lower-cost inference. Both models are hybrid thinking/non-thinking. We tested the reasoning variants in Max Effort and High Effort.

A year ago, DeepSeek R1 and R1 0528 were the leading open weights reasoning models on the Intelligence Index. Since then, several other open weights labs have released strong reasoning models, and V4 Pro now enters as the #2 open weights reasoning model on the Artificial Analysis Intelligence Index, behind only Kimi K2.6 (54). V4 Pro and V4 Flash remain also text input and output only.

Key takeaways:

➤ Large 10 point gain in Intelligence Index: DeepSeek V4 Pro (Max) scores 52 on the Artificial Analysis Intelligence Index, up from 42 for V3.2, making it the #2 open weights reasoning model behind Kimi K2.6. That said, if the weights to MiMo-V2.5-Pro are released as they have been for other Xiaomi models then it will place third. V4 Flash (Max) scores 47, behind V4 Pro but ahead of DeepSeek V3.2. This places it behind frontier models and at Claude Sonnet 4.6 (max) level intelligence.

➤ Leading agentic performance among open weights models: DeepSeek V4 Pro (Max) leads open weight models on agentic real-world work tasks, scoring 1554 on GDPval-AA. This places it ahead of Kimi K2.6 (1484), GLM-5.1 (1535), GLM-5 (1402), and MiniMax-M2.7 (1514).

➤ Gains in knowledge but an increase in hallucination rate: DeepSeek V4 Pro (Max) scores -10 on AA-Omniscience, an 11 point improvement over V3.2 (Reasoning, -21), driven primarily by higher accuracy. V4 Flash (Max) scores -23, broadly in line with V3.2. V4 Pro and V4 Flash both have a very high hallucination rate of 94% and 96% respectively meaning when they don’t know the answer they nearly always respond anyway.

➤ Flash materially behind Pro but well positioned for its size: DeepSeek V4 Flash (Max) scores 47 on the Artificial Analysis Intelligence Index, well below V4 Pro. However, at 284B parameters it is much smaller and is well positioned on the Intelligence vs Size frontier, sitting next to MiniMax-M2.7

➤ Cheaper than frontier models but more expensive than other open weights models and a large increase over DeepSeek V3.2: DeepSeek V4 Pro costs $1,071 to run the Artificial Analysis Intelligence Index. This makes it more than 4x cheaper than Claude Opus 4.7 ($4,811), but it remains more expensive than several other open weights models, including Kimi K2.6 ($948), GLM-5.1 ($544), DeepSeek V3.2 ($71), and gpt-oss-120B ($67). DeepSeek V4 Flash is much cheaper at $113.

➤ High token usage: DeepSeek V4 Pro uses 190M output tokens to run the Artificial Analysis Intelligence Index, making it one of the most token-intensive models tested. DeepSeek V4 Flash is even higher at 240M output tokens. This high token usage helps explain why V4 Pro’s total cost remains relatively high versus other open weights models despite low per-token pricing.

Key model details:

➤ Context window: 1M tokens, an 8x expansion on V3.2's 128K context window

➤ Modalities: Text input and output only, matching V3.2

➤ Size: DeepSeek V4 Pro 1.6T total / 49B active; V4 Flash 284B total / 13B active

➤ License: MIT

➤ Availability: Available on DeepSeek's first-party API; we expect many third-party providers to host the models

➤ Pricing: DeepSeek V4 Pro $1.74 / $3.48 per 1M input/output tokens; V4 Flash $0.14 / $0.28 per 1M input/output tokens. Cache hit input token pricing is $0.145 for V4 Pro and $0.028 for V4 Flash per 1M tokens. V4 Pro is significantly more expensive than past DeepSeek R1 and V3 models

Large 10 point gain in Intelligence Index: DeepSeek V4 Pro (Max) scores 52 on the Artificial Analysis Intelligence Index, up from 42 for V3.2, making it the #2 open weights reasoning model behind Kimi K2.6.

DeepSeek V4 Pro scales DeepSeek’s architecture substantially, while V4 Flash is positioned for size efficiency: V4 Pro is DeepSeek’s largest model to date at 1.6T total parameters / 49B active, a major step up from the V3 family’s 671B total / 37B active architecture. V4 Flash is far smaller at 284B total / 13B active, but sits strongly on the Intelligence vs Size frontier, near MiniMax-M2.7.

DeepSeek V4 Pro leads open weights models on GDPval-AA, our agentic real-world work tasks benchmark. V4 Pro (Max) scores 1554, ahead of Kimi K2.6 (1484), GLM-5.1 (1535), GLM-5 (1402), and MiniMax-M2.7 (1514). V4 Flash (Reasoning, Max Effort) scores 1388.

Lower cost than frontier models, but high token usage keeps costs above most open weights peers: DeepSeek V4 Pro costs $1,071 to run the Artificial Analysis Intelligence Index, more than 4x cheaper than Claude Opus 4.7 ($4,811) but above several open weights models, including Kimi K2.6 ($948), GLM-5.1 ($544), DeepSeek V3.2 ($71), and gpt-oss-120B ($67). This is partly driven by high output token usage: 190M tokens for V4 Pro and 240M for V4 Flash, despite Flash being much cheaper overall at $113.

Gains in knowledge but an increase in hallucination rate: DeepSeek V4 Pro (Max) scores -10 on AA-Omniscience, an 11 point improvement over V3.2 (Reasoning, -21), driven primarily by higher accuracy. V4 Flash (Max) scores -23, broadly in line with V3.2. V4 Pro and V4 Flash both have a very high hallucination rate of 94% and 96% respectively meaning when they don’t know the answer they nearly always respond anyway.

DeepSeek V4 Pro and V4 Flash individual benchmark results

Further benchmarks and analysis on Artificial Analysis of DeepSeek V4 Pro and Flash: https://artificialanalysis.ai/

DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash

Key takeaways:

Key model details:

Read the latest

Fun-Realtime-TTS: New Text to Speech model topping Artificial Analysis leaderboard

MAI-Transcribe-1.5: New Speech to Text model leading the accuracy-speed Pareto frontier

AA-WER Streaming: New Speech to Text Streaming Benchmark