Legal Index

Assesses model performance across the legal domain. Capabilities evaluated include domain-specific knowledge (contract law, tort law, constitutional law), legal research and drafting, litigation support, compliance review, and more.

See representative workflows

The Artificial Analysis Legal Index combines performance across benchmarks chosen for legal practice. We map common tasks from O*NET occupational classifications, then select benchmarks that represent this real-world work. Weights are derived from how often capabilities appear across those tasks.

This composite metric provides a single score for tracking model performance across legal tasks. All underlying benchmarks are run independently by Artificial Analysis. See our Intelligence Benchmarking Methodology for how evaluations are conducted.

Capability	Weight	Evaluations
Legal Knowledge	35%	AA-Omniscience Law Accuracy
Agentic Knowledge Work	25%	GDPval-AA v2
Reasoning	15%	HLE
Long-Context	10%	LCR
Non-Hallucination	10%	AA-Omniscience Non-Hallucination
Agentic Customer Interaction	5%	𝜏³-Banking

AA-Omniscience: Knowledge and Hallucination Benchmark

GDPval-AA v2 Leaderboard

Artificial Analysis Long Context Reasoning Benchmark Leaderboard

𝜏³-Banking Benchmark Leaderboard

Humanity's Last Exam Benchmark Leaderboard

Agentic Index

Coding Index

Finance & Accounting Index

Strategy & Ops Index

Healthcare & Medical Index

Engineering Index

Economics Index

Score

Artificial Analysis Legal Index

Incorporates 5 evaluations: AA-Omniscience, GDPval-AA v2, AA-LCR, 𝜏³-Banking, Humanity's Last Exam · Higher is better

Reasoning models are indicated by a lightbulb icon

Artificial Analysis Legal Index: Capability Breakdown

Incorporates 5 evaluations: AA-Omniscience, GDPval-AA v2, AA-LCR, 𝜏³-Banking, Humanity's Last Exam · Segmented by contribution

Reasoning models are indicated by a lightbulb icon

Capability Breakdown

Artificial Analysis Legal Index: Legal Knowledge

Incorporates 1 evaluation: AA-Omniscience · Higher is better

Reasoning models are indicated by a lightbulb icon

Representative Workflows

Real-world workflows that exercise the capabilities the Legal Index weights most heavily.

Legal research & analysisLegal KnowledgeLong-ContextNon-HallucinationReasoning

Example: Determine whether a non-compete is enforceable under controlling state precedent by gathering the relevant case law, applying each holding to the employee's facts, distinguishing unfavorable rulings, and synthesizing the analysis into a report.

Document drafting & reviewLegal KnowledgeAgentic Knowledge WorkLong-ContextNon-Hallucination

Example: Reconcile US and EU indemnity language in a cross-border M&A share purchase agreement 48 hours before signing to flag irreconcilable conflicts, propose drafting that satisfies both regimes where possible, and deliver a partner-ready redline.

Client counseling & advisoryLegal KnowledgeNon-HallucinationReasoning

Example: Advise a startup shipping a feature that may trigger unsettled state privacy rules such as the CCPA to ask the clarifying questions, lay out the trade-offs by jurisdiction, surface open legal risks, and recommend a defensible launch posture.

Litigation & dispute resolutionLegal KnowledgeLong-ContextNon-HallucinationReasoning

Example: Work a 200,000-document e-discovery production delivered ten days before trial to prioritise responsive material, flag likely privilege issues for attorney review, and draft a deposition outline tied to the strongest exhibits.

Compliance & regulatory workLegal KnowledgeAgentic Knowledge WorkReasoning

Example: Rewrite internal policy for a new financial regulation taking effect in 90 days that clashes with procedures in three business units to produce a unified replacement policy, an implementation plan with named owners, and a training brief grounded in the statute and existing policy library.

Case managementAgentic Knowledge WorkNon-Hallucination

Example: Consolidate 40 active litigation matters tracked across three incompatible case-management systems to produce one unified docket, surface conflicting court deadlines, and propose a single workflow going forward.

Release Date

Artificial Analysis Legal Index vs. Release Date

Most attractive region

Cost

Artificial Analysis Legal Index: Cost per Task

Average cost per task (USD), broken down by input, cache hit, cache write, reasoning, and answer tokens

Average cost per task in the index. Costs are split by input, cache hit, cache write, reasoning, and answer token pricing where canonical token counts are available.

Artificial Analysis Legal Index: Total Cost

Total cost (USD) to run the index

The cost to run the index, calculated using the model's input and output token pricing and the number of tokens used.

Speed

Artificial Analysis Legal Index: Time per Task

Weighted average decode time (minutes) per task; excludes TTFT and overhead time · Lower is better

The weighted average time (minutes) per index task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the index.

Output Tokens

Artificial Analysis Legal Index: Output Tokens per Task

Output tokens used to run one task, broken down by reasoning and answer tokens

The average number of answer and reasoning tokens produced per benchmark task in this index.

Frequently Asked Questions

Based on the Artificial Analysis Legal Index, the top-performing AI models for legal work are currently Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (59), GPT-5.6 Sol (max) (52), and GPT-5.6 Sol (xhigh) (51). Rankings are updated as new models are released.

Yes. The Legal Index from Artificial Analysis is an independent benchmark of how AI models perform on legal work. It measures performance on the legal domain, including legal knowledge, document analysis, long-context reasoning, and non-hallucination.

The Legal Index is a composite benchmark from Artificial Analysis that assesses model performance across the legal domain. Capabilities evaluated include domain-specific knowledge (contract law, tort law, constitutional law), legal research and drafting, litigation support, compliance review, and more.

The Legal Index is calculated as a weighted average of its capability sub-scores. The sub-scores and their weights are: Legal Knowledge (35%), Agentic Knowledge Work (25%), Long-Context (10%), Non-Hallucination (10%), Agentic Customer Interaction (5%), and Reasoning (15%).

The Legal Index includes AA-Omniscience Law Accuracy, GDPval-AA v2, LCR, AA-Omniscience Non-Hallucination, 𝜏³-Banking, and HLE.

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) currently has the highest Legal Index score, with a score of 59 among models with published results. View model

A higher Legal Index score indicates stronger overall performance across the benchmarks that make up the index. For a specific use case, individual benchmark results may be more informative than the composite score.

Legal Index

Background

Methodology

Component Benchmarks

Related Links

Score

Artificial Analysis Legal Index

Artificial Analysis Legal Index: Capability Breakdown

Capability Breakdown

Artificial Analysis Legal Index: Legal Knowledge

Representative Workflows

Release Date

Artificial Analysis Legal Index vs. Release Date

Cost

Artificial Analysis Legal Index: Cost per Task

Index Cost per Task

Artificial Analysis Legal Index: Total Cost

Index Cost

Speed

Artificial Analysis Legal Index: Time per Task

Index Time per Task

Output Tokens

Artificial Analysis Legal Index: Output Tokens per Task

Index Output Tokens per Task

Frequently Asked Questions

Which AI is best for lawyers?

Is there an AI benchmark for legal work?

What is the Legal Index?

How is the Legal Index calculated?

Which benchmarks are included in the Legal Index?

Which AI model has the highest Legal Index score?

How should I interpret the Legal Index?