All capability indexes
Coding Index
Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.
The Coding Index currently includes the following benchmarks:
- Terminal-Bench Hard
An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.
- SciCode
A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.
Coding Index
Independently benchmarked by Artificial Analysis
Coding Index vs. Release Date
Most attractive region
Alibaba
Amazon
Anthropic
DeepSeek
Google
Kimi
Korea Telecom
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
xAI
Xiaomi
Z AI
Coding Index: Output Token Composition
Tokens used to run the evaluation
Reasoning tokens
Answer tokens
Coding Index: Cost Breakdown
Cost (USD) to run the evaluation
Input cost
Reasoning cost
Answer cost