All capability indexes

Artificial Analysis Economics Index

Measures performance on capabilities that matter most for economics work, including economics knowledge, agentic knowledge work, reasoning, and long-context reading. Weights reflect how often each capability appears across common economics tasks.

See representative workflows

The Artificial Analysis Economics Index combines performance across benchmarks chosen for economics work, spanning economics knowledge, agentic execution, reasoning, and long-context reading.

This composite metric prevents narrow specialization and provides a single score for tracking model performance across economics tasks.

Each capability sub-score is normalized to a 0-100 scale, then combined using the weights below. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.

CapabilityWeightEvaluations
Economics Knowledge35%AA-Omniscience Business Accuracy
Reasoning35%HLE
Agentic Knowledge Work15%GDPval-AA v2
Long-Context15%LCR

Score

Artificial Analysis Economics Index

Weighted across capabilities relevant to economics work · Higher is better
Not currently available
Reasoning models are indicated by a lightbulb icon

Economics Index: Capability Breakdown

Each capability area on a 0–100 scale after normalization · Higher is better · Incorporates 4 evaluations: AA-Omniscience, Humanity's Last Exam, GDPval-AA v2, AA-LCR
Reasoning models are indicated by a lightbulb icon

Capability Breakdown

Economics Index: Economics Knowledge

Models ranked by economics knowledge (normalized 0-100) · Higher is better · Incorporates 1 evaluation: AA-Omniscience
Not currently available
Reasoning models are indicated by a lightbulb icon

Representative Workflows

Real-world workflows that exercise the capabilities the Economics Index weights most heavily.

Release Date

Economics Index vs. Release Date

Most attractive region

Cost

Economics Index: Cost per Task

Average cost per task (USD), broken down by input, cache hit, cache write, reasoning, and answer tokens

Average cost per task in the evaluation. Costs are split by input, cache hit, cache write, reasoning, and answer token pricing where canonical token counts are available.

Economics Index: Total Cost

Total cost (USD) to run the evaluation

The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.

Speed

Economics Index: Time per Task

Weighted average wall clock time (minutes) per task; excludes TTFT and execution time · Lower is better

The weighted average time (seconds) per evaluation task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the evaluation.

Output Tokens

Economics Index: Output Tokens per Task

Output tokens used to run one task, broken down by reasoning and answer tokens

The average number of answer and reasoning tokens produced per benchmark task in this evaluation.

Frequently Asked Questions

The Economics Index is a composite benchmark from Artificial Analysis that measures performance on capabilities that matter most for economics work, including economics knowledge, agentic knowledge work, reasoning, and long-context reading. Weights reflect how often each capability appears across common economics tasks.

The Economics Index is calculated as a weighted average of capability sub-scores, each normalized to a 0–100 scale. The sub-scores and their weights are: Economics Knowledge (35%), Reasoning (35%), Agentic Knowledge Work (15%), and Long-Context (15%).

The Economics Index includes AA-Omniscience Business Accuracy, HLE, GDPval-AA v2, and LCR.

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Economics Index score, with a score of 62 among models with published results. View model

A higher Economics Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.