Strategy & Ops Index

Assesses model performance across the strategy and operations domain. Capabilities evaluated include domain-specific knowledge (business and management, accounting, corporate and markets), strategy and planning, customer support, records management, and more.

See representative workflows

The Artificial Analysis Strategy & Ops Index combines performance across benchmarks chosen for strategy, operations, and office administration. We map common tasks from O*NET occupational classifications, then select benchmarks that represent this real-world work. Weights are derived from how often capabilities appear across those tasks.

This composite metric provides a single score for tracking model performance across operations and administrative work. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.

Capability	Weight	Evaluations
Business Knowledge	30%	AA-Omniscience Business Accuracy
Agentic Knowledge Work	30%	GDPval-AA v2
Agentic Customer Interaction	30%	𝜏³-Banking
Instruction Following	5%	IFBench
Long-Context	5%	LCR

AA-Omniscience: Knowledge and Hallucination Benchmark

GDPval-AA v2 Leaderboard

𝜏³-Banking Benchmark Leaderboard

IFBench Benchmark Leaderboard

Artificial Analysis Long Context Reasoning Benchmark Leaderboard

Agentic Index

Coding Index

Finance & Accounting Index

Legal Index

Healthcare & Medical Index

Engineering Index

Economics Index

Score

Artificial Analysis Strategy & Ops Index

Incorporates 5 evaluations: AA-Omniscience, GDPval-AA v2, 𝜏³-Banking, IFBench, AA-LCR · Higher is better

Reasoning models are indicated by a lightbulb icon

Artificial Analysis Strategy & Ops Index: Capability Breakdown

Incorporates 5 evaluations: AA-Omniscience, GDPval-AA v2, 𝜏³-Banking, IFBench, AA-LCR · Segmented by contribution

Reasoning models are indicated by a lightbulb icon

Capability Breakdown

Artificial Analysis Strategy & Ops Index: Business Knowledge

Incorporates 1 evaluation: AA-Omniscience · Higher is better

Reasoning models are indicated by a lightbulb icon

Representative Workflows

Real-world workflows that exercise the capabilities the Strategy & Ops Index weights most heavily.

Strategy & planningBusiness KnowledgeAgentic Knowledge WorkLong-Context

Example: Assess a mid-market SaaS company's competitive position from market-share data, analyst reports, and win/loss notes, work through a Porter's Five Forces and SWOT read, and formulate three prioritized strategic options with the trade-offs of each.

Data entry & document processingInstruction Following

Example: Scan 3,000 invoices of different formats before month-end and extract line-items into structured fields to add to the general ledger.

Customer service & supportAgentic Knowledge WorkAgentic Customer InteractionInstruction Following

Example: Absorb a 40% support surge after a product recall in a CRM ticketing queue with 20-minute hold times to triage incoming tickets by SLA, de-escalate frustrated customers in live chat, and follow the approved recall script verbatim.

Scheduling & calendar managementAgentic Knowledge WorkInstruction Following

Example: Reconcile an executive's schedule when they're triple-booked across a full week of calendars in a scheduling tool, including external stakeholders with limited availability, to weigh free/busy windows, propose conflict resolutions ranked by stakeholder seniority, and draft rescheduling notes.

Billing, invoicing & bookkeepingBusiness KnowledgeAgentic Knowledge Work

Example: Clean up a stalled month-end close where multiple departments coded the same expenses to different GL accounts across hundreds of invoices to propose a consistent chart-of-accounts mapping, identify entries needing reclassification, and draft adjusting journal entries.

Records management & filingAgentic Knowledge WorkInstruction FollowingLong-Context

Example: Consolidate five years of training records split across paper files and two unindexed document systems for a regulatory request to build one audit-ready manifest, flag missing records with supporting evidence, and propose a retention schedule for the next cycle.

Release Date

Artificial Analysis Strategy & Ops Index vs. Release Date

Most attractive region

Cost

Artificial Analysis Strategy & Ops Index: Cost per Task

Average cost per task (USD), broken down by input, cache hit, cache write, reasoning, and answer tokens

Average cost per task in the index. Costs are split by input, cache hit, cache write, reasoning, and answer token pricing where canonical token counts are available.

Artificial Analysis Strategy & Ops Index: Total Cost

Total cost (USD) to run the index

The cost to run the index, calculated using the model's input and output token pricing and the number of tokens used.

Speed

Artificial Analysis Strategy & Ops Index: Time per Task

Weighted average decode time (minutes) per task; excludes TTFT and overhead time · Lower is better

The weighted average time (minutes) per index task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the index.

Output Tokens

Artificial Analysis Strategy & Ops Index: Output Tokens per Task

Output tokens used to run one task, broken down by reasoning and answer tokens

The average number of answer and reasoning tokens produced per benchmark task in this index.

Frequently Asked Questions

Based on the Artificial Analysis Strategy & Ops Index, the top-performing AI models for strategy and operations work are currently GPT-5.6 Sol (max) (51), Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (50), and GPT-5.6 Sol (xhigh) (50). Rankings are updated as new models are released.

Yes. The Strategy & Ops Index from Artificial Analysis is an independent benchmark of how AI models perform on strategy and operations work. It measures performance on the strategy and operations domain, including business knowledge, agentic workflows, customer interaction, and instruction following.

The Strategy & Ops Index is a composite benchmark from Artificial Analysis that assesses model performance across the strategy and operations domain. Capabilities evaluated include domain-specific knowledge (business and management, accounting, corporate and markets), strategy and planning, customer support, records management, and more.

The Strategy & Ops Index is calculated as a weighted average of its capability sub-scores. The sub-scores and their weights are: Business Knowledge (30%), Agentic Knowledge Work (30%), Agentic Customer Interaction (30%), Instruction Following (5%), and Long-Context (5%).

The Strategy & Ops Index includes AA-Omniscience Business Accuracy, GDPval-AA v2, 𝜏³-Banking, IFBench, and LCR.

GPT-5.6 Sol (max) currently has the highest Strategy & Ops Index score, with a score of 51 among models with published results. View model

A higher Strategy & Ops Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.

Strategy & Ops Index

Background

Methodology

Component Benchmarks

Related Links

Score

Artificial Analysis Strategy & Ops Index

Artificial Analysis Strategy & Ops Index: Capability Breakdown

Capability Breakdown

Artificial Analysis Strategy & Ops Index: Business Knowledge

Representative Workflows

Release Date

Artificial Analysis Strategy & Ops Index vs. Release Date

Cost

Artificial Analysis Strategy & Ops Index: Cost per Task

Index Cost per Task

Artificial Analysis Strategy & Ops Index: Total Cost

Index Cost

Speed

Artificial Analysis Strategy & Ops Index: Time per Task

Index Time per Task

Output Tokens

Artificial Analysis Strategy & Ops Index: Output Tokens per Task

Index Output Tokens per Task

Frequently Asked Questions

Which AI is best for consultants?

Is there an AI benchmark for strategy and operations work?

What is the Strategy & Ops Index?

How is the Strategy & Ops Index calculated?

Which benchmarks are included in the Strategy & Ops Index?

Which AI model has the highest Strategy & Ops Index score?

How should I interpret the Strategy & Ops Index?