All capability indexes
Agentic Index
Measures performance in agentic workflows, focusing on behaviors like tool use, planning, autonomy, and complex problem solving.
The Agentic Index currently includes the following benchmarks:
- GDPval-AA
GDPval-AA is Artificial Analysis' evaluation framework for OpenAI's GDPval dataset. It tests AI models on real-world tasks across 44 occupations and 9 major industries. Models are given shell access and web browsing capabilities in an agentic loop via Stirrup to solve tasks, with ELO ratings derived from blind pairwise comparisons.
- 𝜏²-Bench Telecom
A dual-control conversational AI benchmark simulating technical support scenarios where both agent and user must coordinate actions to resolve telecom service issues.
Agentic Index
Independently benchmarked by Artificial Analysis
Agentic Index vs. Release Date
Most attractive region
Alibaba
Amazon
Anthropic
DeepSeek
Google
Kimi
Korea Telecom
LG AI Research
MBZUAI Institute of Foundation Models
Meta
MiniMax
Mistral
NVIDIA
OpenAI
xAI
Xiaomi
Z AI
Agentic Index: Output Token Composition
Tokens used to run the evaluation
Reasoning tokens
Answer tokens
Agentic Index: Cost Breakdown
Cost (USD) to run the evaluation
Input cost
Reasoning cost
Answer cost