Claude Code vs. Opencode

Comparison between Claude Code and Opencode across the Artificial Analysis Coding Agent Index, including benchmark scores, cost, execution time, and token usage.

For details relating to our methodology, see our methodology page.

Explore other comparisons
vs

Highlights

Updated
Artificial Analysis Coding Agent Index · Higher is better
Not currently available
Mean agent wall time per task · Lower is better
Not currently available
Mean API cost per task (USD) · Lower is better
Not currently available

Comparison

Side-by-side comparison of Claude Code and Opencode.

Coding Agent Comparison

Metric
Claude Code
Fable 5 (max) (with fallback)
Opencode
Opus 4.7 (medium)
Analysis
Agent Harness
Claude Code
Opencode
Representative Model
Fable 5 (max) (with fallback)
Opus 4.7 (medium)
Coding Agent Index
77
65
Claude Code has a higher Coding Agent Index than Opencode
DeepSWE
66%
40%
Claude Code has a higher DeepSWE score than Opencode
Terminal-Bench v2
82%
75%
Claude Code has a higher Terminal-Bench v2 score than Opencode
SWE-Atlas-QnA
83%
79%
Claude Code has a higher SWE-Atlas-QnA score than Opencode
Cost per Task
$11.75
$2.93
Opencode has a lower cost per task than Claude Code
Time per Task
23.5m
12.2m
Opencode has a lower time per task than Claude Code
Turns per Task
138
54.2
Opencode has a lower turns per task than Claude Code
Token Usage per Task
14.1M
7.6M
Opencode has a lower token usage per task than Claude Code
Cache Hit Rate
96%
96%
Claude Code has a higher cache hit rate than Opencode

Model Variants

Evaluated model variants for Claude Code and Opencode.

Model Variants

Claude Code
Fable 5 (max) (with fallback)
77
66%
82%
83%
$11.75
23.5m
14.1M
Claude Code
Opus 4.8 (max)
73
56%
79%
82%
$7.70
23.1m
18M
Claude Code
Opus 4.8 (medium)
67
49%
75%
77%
$3.26
12.4m
7.8M
Claude Code
Opus 4.7 (max)
65
40%
74%
81%
$5.64
15.8m
16M
Claude Code
Opus 4.7 (medium)
57
27%
71%
72%
$1.68
6.3m
4.6M
Claude Code
Sonnet 4.6 (medium)
54
29%
63%
70%
$1.97
13.7m
8.5M
Claude Code
GLM-5.1
52
19%
65%
73%
$4.33
19.6m
25.9M
Claude Code
Qwen3.7 Plus (thinking)
52
19%
65%
72%
$6.23
10.6m
8.7M
Claude Code
DeepSeek V4 Pro (high)
47
9%
65%
68%
$0.27
17.9m
9.7M
Claude Code
Kimi K2.6
47
17%
64%
60%
$1.18
41.2m
11.4M
Claude Code
Opus 4.6 (medium)
71
-
70%
72%
$1.26
8.0m
4.5M
Opencode
Opus 4.7 (medium)
65
40%
75%
79%
$2.93
12.2m
7.6M

Performance

Performance across the Artificial Analysis Coding Agent Index.

Artificial Analysis Coding Agent Index

Composite average pass@1 across DeepSWE, Terminal-Bench v2, and SWE-Atlas-QnA · Higher is better
Not currently available

The Artificial Analysis Coding Agent Index is a composite score built from DeepSWE, Terminal-Bench v2, and SWE-Atlas-QnA.

It is useful for quick comparison, but it should be read alongside the per-eval breakdowns. Two agents with similar index values can still have different strengths across repository tasks, terminal workflows, and rubric-based evaluations.

Token Usage

Token consumption across the Artificial Analysis Coding Agent Index.

Token Usage per Task

Mean input, cache, and output tokens per task
Prompt cache hit rates can vary significantly by provider routing, which can materially change effective cost.

Non-cached input tokens sent to the model, including prompts, instructions, tool context, and task context that were not served from prompt cache.

Reused prompt tokens billed through provider prompt caching when that telemetry is available, rather than being processed as a fully fresh input each time.

Some providers route repeated requests across different backend replicas. When prompt cache state is not shared consistently across those replicas, a model may receive fewer cache hits even when the benchmark task flow is otherwise identical.

We do not add custom relay headers or provider-specific affinity controls to force higher cache reuse, because that would make the benchmark less representative of a typical user setup. As a result, reported costs reflect the cache behavior observed through the configured provider path, not an optimized best-case cache scenario.

Tokens returned by the model in its visible response during the task.

Artificial Analysis Coding Agent Index vs. Total Tokens

Artificial Analysis Coding Agent Index vs. mean total tokens per task
Most attractive quadrant

Each point represents a coding-agent variant. Farther right means higher benchmark performance, while lower token usage appears farther left. Agents toward the upper-left use fewer tokens for a given level of performance.

Cost

Pay-per-token API cost across the Artificial Analysis Coding Agent Index, based on current per-token pricing.

Cost per Task

Mean pay-per-token API cost per task (USD) · Lower is better
Not currently available

This chart shows the mean pay-per-token API cost per task across the Artificial Analysis Coding Agent Index, spanning DeepSWE, Terminal-Bench v2, and SWE-Atlas-QnA.

Where applicable, that cost model includes standard input pricing, discounted cached-input pricing, separate cache-write charges, and output pricing rather than treating all prompt tokens as if they were billed at the same uncached input rate.

It is intended to show pay-per-token API cost, not consumer plan pricing or the full operational cost of deploying the system in production. Infrastructure, engineering, and supervision costs are not the focus of this metric.

Artificial Analysis Coding Agent Index vs. Cost per Task

Artificial Analysis Coding Agent Index vs. mean pay-per-token API cost per task (USD)
Most attractive quadrant

Each point represents a coding-agent variant. Farther right means higher benchmark performance, while lower on the chart means lower mean cost per task. The most efficient agents sit toward the lower-right: stronger results at lower cost.

Execution Time

Active agent runtime across the Artificial Analysis Coding Agent Index.

Time per Task

Mean agent wall time per task · Lower is better
Not currently available

This chart uses agent wall time: how long the agent process was actively running on each task.

It does not include environment startup, verifier or judge time, or other harness overhead, so it is a cleaner comparison of how long the agent itself was working.

Artificial Analysis Coding Agent Index vs. Execution Time

Artificial Analysis Coding Agent Index vs. mean agent wall time per task
Most attractive quadrant

Each point represents a coding-agent variant. Farther right means higher benchmark performance, while lower on the chart means shorter mean agent runtime per task. Agents toward the lower-right deliver stronger results in less active agent time.