All capability indexes

Strategy & Ops Index

Measures performance on capabilities that matter most for office and operations work, including non-hallucinated data handling, agentic execution, business knowledge, and instruction following. Weights are derived from the relative frequency of those capabilities across the top tasks performed by office and administrative support workers.

See representative workflows

The Artificial Analysis Strategy & Ops Index combines performance across benchmarks chosen for strategy, operations, and office administration. Weights follow how often each capability appears in tasks typical of office and administrative support roles—the largest occupational group in the U.S.—grouped by task rather than by job title.

This composite metric prevents narrow specialization and provides a single score for tracking model performance across operations and administrative work.

Each capability sub-score is normalised to a 0-100 scale, then combined using the weights below. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.

CategoryWeightEvaluations
Business Knowledge30%AA-Omniscience Business Accuracy
Agentic30%GDPval-AA v2
Quantitative and Scientific Reasoning25%Crit-Pt, HLE
Non-Hallucination10%AA-Omniscience Non-Hallucination
Instruction Following5%IFBench

Score

Strategy & Ops Index

Weighted across capabilities relevant to operations and admin work · Higher is better
Reasoning models are indicated by a lightbulb icon

Strategy & Ops Index: Capability Breakdown

Each capability area on a 0–100 scale after normalisation · Higher is better
Reasoning models are indicated by a lightbulb icon

Capability Breakdown

Strategy & Ops Index: Business Knowledge

Models ranked by business knowledge (normalised 0-100) · Higher is better
Reasoning models are indicated by a lightbulb icon

Representative Workflows

Real-world workflows that exercise the capabilities the Strategy & Ops Index weights most heavily.

Release Date

Strategy & Ops Index vs. Release Date

Most attractive quadrant

Speed

Strategy & Ops Index vs. Output Speed

Strategy & Ops Index · Output tokens per second
Most attractive quadrant

There is a trade-off between model quality and output speed, with higher intelligence models typically having lower output speed.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Pricing

Pricing: Input and Output Prices

USD per 1M tokens (blended)
Reasoning models are indicated by a lightbulb icon

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Figures represent median (P50) measurement over the past 72 hours to reflect sustained changes in performance.

Strategy & Ops Index vs. Price

Strategy & Ops Index · USD per 1M tokens (blended)
Most attractive quadrant

While higher intelligence models are typically more expensive, they do not all follow the same price-quality curve.

Token Usage

Strategy & Ops Index: Output Token Composition

Tokens used to run the evaluation

The total number of tokens used to run the evaluation, including input tokens (prompt), reasoning tokens (for reasoning models), and answer tokens (final response).

Cost

Strategy & Ops Index: Cost Breakdown

Cost (USD) to run the evaluation

The cost to run the evaluation, calculated using the model's input and output token pricing and the number of tokens used.

Frequently Asked Questions

The Strategy & Ops Index is a composite benchmark from Artificial Analysis that measures performance on capabilities that matter most for office and operations work, including non-hallucinated data handling, agentic execution, business knowledge, and instruction following. Weights are derived from the relative frequency of those capabilities across the top tasks performed by office and administrative support workers.

The Strategy & Ops Index is calculated as a weighted average of capability sub-scores, each normalised to a 0–100 scale. The sub-scores and their weights are: Business Knowledge (30%), Agentic (30%), Quantitative and Scientific Reasoning (25%), Non-Hallucination (10%), and Instruction Following (5%).

The Strategy & Ops Index includes AA-Omniscience Business Accuracy, GDPval-AA v2, Crit-Pt, HLE, AA-Omniscience Non-Hallucination, and IFBench.

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Strategy & Ops Index score, with a score of 55 among models with published results. View model

A higher Strategy & Ops Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.