All capability indexes

Coding Index

Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.

The headline score is the average of the benchmarks listed below. Each row links to its result chart further down the page when one is available, or out to the underlying benchmark.

  • Terminal-Bench v2.1

    A verified refresh of Terminal-Bench v2.0 — 89 curated tasks across software engineering, system administration, data processing, model training, and security, with environment and instruction fixes so scores reflect agent capability rather than environment gaps.

  • SciCode

    A scientist-curated coding benchmark featuring 288 test set subproblems from 80 laboratory problems across 16 scientific disciplines.

Score

Artificial Analysis Coding Index

Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.
Reasoning models are indicated by a lightbulb icon

Release Date

Artificial Analysis Coding Index vs. Release Date

Most attractive region

Cost

Artificial Analysis Coding Index: Cost per Task

Average cost per task (USD), broken down by input, cache hit, cache write, reasoning, and answer tokens

Average cost per task in the index. Costs are split by input, cache hit, cache write, reasoning, and answer token pricing where canonical token counts are available.

Artificial Analysis Coding Index: Total Cost

Total cost (USD) to run the index

The cost to run the index, calculated using the model's input and output token pricing and the number of tokens used.

Speed

Artificial Analysis Coding Index: Time per Task

Weighted average wall clock time (minutes) per task; excludes TTFT and execution time · Lower is better

The weighted average time (minutes) per index task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the index.

Output Tokens

Artificial Analysis Coding Index: Output Tokens per Task

Output tokens used to run one task, broken down by reasoning and answer tokens

The average number of answer and reasoning tokens produced per benchmark task in this index.

Frequently Asked Questions

Based on the Artificial Analysis Coding Index, the top-performing AI models for coding are currently Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (76), GPT-5.5 (xhigh) (75), and Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (74). Rankings are updated as new models are released.

Yes. The Coding Index from Artificial Analysis is an independent benchmark of how AI models perform on coding. It evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.

The Coding Index is a composite benchmark from Artificial Analysis that evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.

The Coding Index is calculated as an equal-weighted average of its underlying benchmark scores, on the same scale as the Artificial Analysis Intelligence Index.

The Coding Index includes Terminal-Bench v2.1 and SciCode.

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Coding Index score, with a score of 76 among models with published results. View model

A higher Coding Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.