Claude Sonnet 4.6: Everything You Need to Know

See model page

Claude Sonnet 4.6 takes second place in the Artificial Analysis Intelligence Index (behind Opus 4.6), but used ~3x more output tokens than Claude Sonnet 4.5 in its max effort mode. Sonnet 4.6 leads all models in GDPval-AA and TerminalBench, including a slight lead over Opus 4.6

Key takeaways:

AnthropicAI’s Sonnet 4.6 scores 51 on the Artificial Analysis Intelligence Index, an 8-point jump from Sonnet 4.5 (Reasoning, 43). Our Intelligence Index is our synthesis metric that incorporates 10 evaluations covering capabilities including agentic tasks, coding and scientific reasoning. This places Sonnet 4.6 essentially tied with GPT-5.2 (xhigh, 51) and only behind Claude Opus 4.6 (Adaptive Reasoning, max effort, 53). This is the first time Anthropic has occupied the top 2 spots on our Intelligence Index, and the gap between Sonnet and Opus has narrowed significantly, from 7 points (Opus 4.5 vs Sonnet 4.5) to just 2 points.
Sonnet 4.6 has a slight lead in 2 of our evaluations: GDPval-AA (Agentic Real-World Work Tasks) and TerminalBench (Agentic Coding & Terminal Use). Notably, Sonnet 4.6 outperforms all other models on these evaluations including Opus 4.6, making it the overall strongest model we have tested for agentic use cases.
For agentic tasks, Sonnet 4.6 offers a better price to performance tradeoff compared to Opus 4.6. Sonnet 4.6 outperforms Opus 4.6 on both GDPval-AA (1633 vs 1606) and TerminalBench (53% vs 46%), while being priced 40% lower ($3/$15 vs $5/$25 per 1M input/output tokens) - although Sonnet 4.6’s increased token usage leaves it only slightly cheaper than Opus.
Sonnet 4.6 is less token efficient than Opus 4.6 and Sonnet 4.5. Sonnet 4.6 used 74M output tokens in max effort mode to run our Intelligence Index evaluations, ~3x Sonnet 4.5 (Reasoning, 25M) and ~28% more than Opus 4.6 (Adaptive Reasoning, max effort, 58M). This is particularly driven by HLE, where Sonnet 4.6 used 47M output tokens alone (64% of its total), but the pattern holds broadly across evaluations. We evaluated both Sonnet 4.6 and Opus 4.6 in ‘adaptive thinking’ mode with ‘max’ effort.
Sonnet 4.6 (in adaptive thinking mode with max effort) cost $2,088 to run the Artificial Analysis Intelligence Index. This is ~3x the cost of Sonnet 4.5 (Reasoning), which cost ~$733, driven by the significantly higher output token usage. However, Sonnet 4.6 remains cheaper than Opus 4.6 ($2,486) due to its 40% lower per-token pricing ($3/$15 vs $5/$25 per 1M input/output tokens). This does not account for cached input token discounts offered by Anthropic and others.
Sonnet 4.6 is priced identically to Sonnet 4.5 ($3/$15 per 1M input/output tokens). However, the significantly higher token usage in adaptive thinking mode means that real-world costs will be substantially higher than Sonnet 4.5 for comparable tasks. With Sonnet 4.6 priced at only 40% less than Opus 4.6 (Anthropic cut Opus pricing with the Opus 4.5 launch in late November 2025 by 66% - from $15/$75 to the current $5/$25), the set of use cases that Sonnet makes sense for over Opus is narrower than ever.
Sonnet 4.6 introduces the same 'adaptive thinking' mode as Opus 4.6, replacing Anthropic's previous 'extended thinking' mode. Instead of setting a 'thinking token budget', developers can now control the model's thinking with the 'effort' setting (with 'low', 'medium', 'high' and 'max' settings). We evaluated adaptive thinking with max effort.

Intelligence Index

Key model details:

➤ Context window: 1M tokens (currently in beta), up from Sonnet 4.5's 200K standard context window.

➤ Max output tokens: 128K tokens (vs. Sonnet 4.5’s 64K max output tokens) and equivalent to Opus 4.6

➤ Pricing: $3/$15 per 1M input/output tokens (unchanged from Sonnet 4.5).

➤ Availability: Claude Sonnet 4.6 is available via Anthropic's API, Google Vertex, Amazon Bedrock and Microsoft Azure. Claude Sonnet 4.6 is also available for Claude Chat, Claude Cowork and Claude Code

Sonnet 4.6 leads all models we have tested on GDPval-AA and TerminalBench, outperforming even Claude Opus 4.6. This is a notable result and highlights Anthropic's strength in agentic capabilities across the Claude models

Agentic Capabilities

Sonnet 4.6 used 74M output tokens to run the Artificial Analysis Intelligence Index, ~3x Sonnet 4.5 (Reasoning, 25M) and more than Opus 4.6 (Adaptive Reasoning, 58M)

Output Tokens

Check out additional analysis for this model on X: https://x.com/ArtificialAnlys/status/2024259812176121952?s=20

Explore the full suite of benchmarks at https://artificialanalysis.ai/

Claude Sonnet 4.6: Everything You Need to Know

Read the latest

DeepSeek V4 Flash 0731 scores 50 on the Artificial Analysis Intelligence Index, 10 points above previous DeepSeek V4 Flash

Inkling Small lands within a point of Inkling on the Artificial Analysis Intelligence Index with less than a third of the parameters

Agnes AI releases Agnes 2.5 Pro Alpha