Coding Index
Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.
The headline score is the average of the benchmarks listed below. Each row links to its result chart further down the page when one is available, or out to the underlying benchmark.
- Terminal-Bench v2.1
A verified refresh of Terminal-Bench v2.0 — 89 curated tasks across software engineering, system administration, data processing, model training, and security, with environment and instruction fixes so scores reflect agent capability rather than environment gaps.
- SciCode
A scientist-curated coding benchmark featuring 288 test set subproblems from 80 laboratory problems across 16 scientific disciplines.
Score
Coding Index
Release Date
Coding Index vs. Release Date
Token Usage
Coding Index: Output Token Composition
Cost
Coding Index: Cost Breakdown
Frequently Asked Questions
The Coding Index is a composite benchmark from Artificial Analysis that evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.
The Coding Index is calculated as the average of its underlying benchmark scores, normalised to a 0–100 scale.
The Coding Index includes Terminal-Bench v2.1 and SciCode.
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Coding Index score, with a score of 76 among models with published results. View model
A higher Coding Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.