Artificial Analysis Engineering Index
Measures performance on capabilities that matter most for engineering work, including engineering knowledge, quantitative reasoning, agentic knowledge work, and terminal use. Weights reflect how often each capability appears across common engineering tasks.
See representative workflowsThe Artificial Analysis Engineering Index combines performance across benchmarks chosen for engineering work, spanning engineering knowledge, reasoning, agentic execution, and terminal use.
This composite metric prevents narrow specialization and provides a single score for tracking model performance across engineering tasks.
Each capability sub-score is normalized to a 0-100 scale, then combined using the weights below. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.
| Capability | Weight | Evaluations |
|---|---|---|
| Engineering Knowledge | 35% | AA-Omniscience Science, Engineering & Mathematics Accuracy |
| Reasoning | 35% | HLE, GPQA Diamond, Crit-Pt |
| Agentic Knowledge Work | 25% | GDPval-AA v2 |
| Agentic Terminal Use | 5% | Terminal-Bench v2.1 |
Score
Artificial Analysis Engineering Index
Engineering Index: Capability Breakdown
Capability Breakdown
Engineering Index: Engineering Knowledge
Representative Workflows
Real-world workflows that exercise the capabilities the Engineering Index weights most heavily.
Release Date
Engineering Index vs. Release Date
Cost
Engineering Index: Cost per Task
Engineering Index: Total Cost
Speed
Engineering Index: Time per Task
Output Tokens
Engineering Index: Output Tokens per Task
Frequently Asked Questions
The Engineering Index is a composite benchmark from Artificial Analysis that measures performance on capabilities that matter most for engineering work, including engineering knowledge, quantitative reasoning, agentic knowledge work, and terminal use. Weights reflect how often each capability appears across common engineering tasks.
The Engineering Index is calculated as a weighted average of capability sub-scores, each normalized to a 0–100 scale. The sub-scores and their weights are: Engineering Knowledge (35%), Reasoning (35%), Agentic Knowledge Work (25%), and Agentic Terminal Use (5%).
The Engineering Index includes AA-Omniscience Science, Engineering & Mathematics Accuracy, HLE, GPQA Diamond, Crit-Pt, GDPval-AA v2, and Terminal-Bench v2.1.
Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Engineering Index score, with a score of 63 among models with published results. View model
A higher Engineering Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.