Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis
All capability indexes

Coding Index

Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.

The Coding Index currently includes the following benchmarks:

  • Terminal-Bench Hard

    An agentic benchmark evaluating AI capabilities in terminal environments through software engineering, system administration, and data processing tasks.

  • SciCode

    A scientist-curated coding benchmark featuring 338 sub-tasks derived from 80 genuine laboratory problems across 16 scientific disciplines.

Coding Index

Independently conducted by Artificial Analysis

Coding Index vs. Release Date

Most attractive region
Alibaba
Amazon
Anthropic
DeepSeek
Google
Kimi
Korea Telecom
KwaiKAT
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
TII UAE
xAI
Xiaomi
Z AI

Coding Index: Output Token Composition

Tokens used to run the evaluation
Answer tokens
Reasoning tokens

The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).

Coding Index: Cost Breakdown

Cost (USD) to run the evaluation
Answer cost
Input cost
Reasoning cost

The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.