All capability indexes
Coding Index
Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.
The Coding Index currently includes the following benchmarks:
- Terminal-Bench Hard
An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.
- SciCode
A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.
Coding Index
Independently conducted by Artificial Analysis
Coding Index vs. Release Date
Most attractive region
Alibaba
Amazon
Anthropic
DeepSeek
Google
Kimi
Korea Telecom
KwaiKAT
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
TII UAE
xAI
Xiaomi
Z AI
Coding Index: Output Token Composition
Tokens used to run the evaluation
Answer tokens
Reasoning tokens
The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).
Coding Index: Cost Breakdown
Cost (USD) to run the evaluation
Answer cost
Input cost
Reasoning cost
The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.