Capability Indices

Agentic Index

Measures performance in agentic workflows, focusing on behaviors like tool use, planning, autonomy, and complex problem solving.

Coding Index

Evaluates models' ability to solve programming problems, including those requiring scientific and research domain knowledge.

Finance & Accounting Index

Measures performance on the finance and accounting domain, including financial knowledge, mathematical reasoning, long-context analysis, reporting, and more.

Strategy & Ops Index

Measures performance on the strategy and operations domain, including business knowledge, agentic workflows, customer interaction, and instruction following.

Legal Index

Measures performance on the legal domain, including legal knowledge, document analysis, long-context reasoning, and non-hallucination.

Healthcare & Medical Index

Measures performance on the medical and healthcare domain, including clinical knowledge, clinical reasoning, agentic workflows, and non-hallucination.

Engineering Index

Measures performance on the engineering domain, including engineering knowledge, quantitative reasoning, agentic execution, and terminal use.

Economics Index

Measures performance on the economics domain, including economics knowledge, quantitative reasoning, agentic execution, and long-context analysis.