Name: Artificial Analysis Intelligence Index by Model Type
Creator: Artificial Analysis
License: https://creativecommons.org/licenses/by/4.0/

AI Progress

Tracking the continued advancement of AI, and the position of each of the leading AI companies.

Frontier Language Model Intelligence, Over Time

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

Alibaba

Anthropic

DeepSeek

Google

Kimi

KwaiKAT

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

OpenAI

xAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Capital Expenditure by Major Tech Companies, Over Time

Capital Expenditure by Quarter (in Billions of USD)

Microsoft

Google

Meta

Amazon

Oracle

Represents major investments by tech companies in infrastructure, including AI hardware (like GPUs and data centers). Capex is a strong indicator of a company's commitment to AI development, as training and running frontier models requires significant computing resources.

Note: Capex data is sourced from publicly available financial reports, news articles, and primarily from the SEC.

Intelligence vs. Release Date

Artificial Analysis Intelligence Index; Release Date

Most attractive region

Alibaba

Amazon

Anthropic

DeepSeek

Google

Kimi

KwaiKAT

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

NVIDIA

OpenAI

xAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Leading Models by AI Lab

Highest Artificial Analysis Intelligence Index achieved by each AI Lab

Estimate (independent evaluation forthcoming)

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Artificial Analysis Intelligence Index by Model Type

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

Reasoning Model

Non-Reasoning Model

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Efficiency

Analysis of how efficiency of AI is developing. This includes considering how the cost of achieving levels of intelligence and the speed at which this intelligence is accessible are trending.

Language Model Inference Price

Price (USD per M Tokens)

Intelligence Index < 10

10 <= Intelligence Index < 20

20 <= Intelligence Index < 30

30 <= Intelligence Index < 40

40 <= Intelligence Index < 50

Intelligence Index >= 50

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Language Model Output Speed

Output Tokens per Second

Intelligence Index < 10

10 <= Intelligence Index < 20

20 <= Intelligence Index < 30

30 <= Intelligence Index < 40

40 <= Intelligence Index < 50

Intelligence Index >= 50

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Country Analysis

AI is a global phenomenon. We share perspectives on where AI progress is happening and how the leading models from countries compare. We deep-dive on the US and China as the two leading hubs for AI development.

Frontier Language Model Intelligence By Country, Over Time

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

China

United States

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Open Weights: Frontier Language Model Intelligence By Country, Over Time

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

China

United States

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Leading Models by Country

Artificial Analysis Intelligence Index, Leading Models

United States

China

South Korea

United Arab Emirates

France

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Open Source Models

Open weights models offer flexibility in deployment and the ability to fine-tune the models for specific use cases. We analyze the leading open weights models and how intelligence compares to proprietary models.

Progress in Open Weights vs. Proprietary Intelligence

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

Open Weights

Proprietary

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

Artificial Analysis Intelligence Index by Open Weights / Proprietary

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt

Proprietary

Open Weights

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

Model Architecture

The architecture of AI models influences their performance and efficiency. This section examines architectural trends in models, such as the rise in popularity of the Mixture of Experts (MoE) architecture, and how these relate to model capabilities.

Intelligence Index vs Release Date by Model Architecture

Most attractive region

Dense

MoE

A model where only a subset of parameters ("experts") are active per input. Routing mechanisms select a few experts per forward pass, reducing computation while allowing the model to scale to many more parameters overall.

A model where all parameters are active for every input. Every forward pass involves the full network, making it computationally intensive but straightforward to train and deploy.

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

Active Parameters

Passive Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Active Parameters

Active Parameters at Inference Time; Artificial Analysis Intelligence Index

Most attractive quadrant

Alibaba

DeepSeek

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

NVIDIA

OpenAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)

Most attractive quadrant

Alibaba

DeepSeek

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

NVIDIA

OpenAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index; Size in Parameters (Billions)

Most attractive quadrant

Alibaba

DeepSeek

Google

Kimi

LG AI Research

MBZUAI Institute of Foundation Models

Meta

MiniMax

Mistral

OpenAI

xAI

Xiaomi

Z AI

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Context Length (Tokens), Median By Quarter

Median Context Length (thousand tokens)

Open Source

Proprietary

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Training Analysis

Our analysis of trends in the training of AI models includes consideration of both trends in the size of training runs as well as the relationship between the size of the training run and the intelligence of the model.

Training Tokens By Model

Training tokens in trillions

The number of tokens used to train the model, represented in trillions.

Intelligence vs. Training Tokens

Artificial Analysis Intelligence Index v4.0 incorporates 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt; Training Tokens in Trillions

Most attractive quadrant

Alibaba

Google

Meta

Artificial Analysis Intelligence Index v4.0 includes: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. See Intelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

The number of tokens used to train the model, represented in trillions.

Artificial Analysis AI Trends

Navigation

AI Progress

Frontier Language Model Intelligence, Over Time

Capital Expenditure by Major Tech Companies, Over Time

Intelligence vs. Release Date

Leading Models by AI Lab

Artificial Analysis Intelligence Index by Model Type

Efficiency

Language Model Inference Price

Language Model Output Speed

Country Analysis

Frontier Language Model Intelligence By Country, Over Time

Open Weights: Frontier Language Model Intelligence By Country, Over Time

Leading Models by Country

Open Source Models

Progress in Open Weights vs. Proprietary Intelligence

Artificial Analysis Intelligence Index by Open Weights / Proprietary

Model Architecture

Intelligence Index vs Release Date by Model Architecture

Model Size: Total and Active Parameters

Intelligence vs. Active Parameters

Intelligence vs. Total Parameters

Intelligence vs. Total Parameters

Context Length (Tokens), Median By Quarter

Training Analysis

Training Tokens By Model

Intelligence vs. Training Tokens

Artificial Analysis AI Trends

Navigation

AI Progress

Frontier Language Model Intelligence, Over Time

Artificial Analysis Intelligence Index

Capital Expenditure by Major Tech Companies, Over Time

Capital Expenditure (Capex)

Capex Data Sources Note

Intelligence vs. Release Date

Artificial Analysis Intelligence Index

Leading Models by AI Lab

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index by Model Type

Artificial Analysis Intelligence Index

Efficiency

Language Model Inference Price

Artificial Analysis Intelligence Index

Price

Model Performance Representation

Language Model Output Speed

Artificial Analysis Intelligence Index

Output Speed

Model Performance Representation

Country Analysis

Frontier Language Model Intelligence By Country, Over Time

Artificial Analysis Intelligence Index

Price

Model Performance Representation

Open Weights: Frontier Language Model Intelligence By Country, Over Time

Artificial Analysis Intelligence Index

Price

Model Performance Representation

Leading Models by Country

Artificial Analysis Intelligence Index

Open Source Models

Progress in Open Weights vs. Proprietary Intelligence

Artificial Analysis Intelligence Index

Open Weights

Artificial Analysis Intelligence Index by Open Weights / Proprietary

Artificial Analysis Intelligence Index

Open Weights

Model Architecture

Intelligence Index vs Release Date by Model Architecture

MoE Models

Dense Models

Model Size: Total and Active Parameters

Total Parameters

Active Parameters at Inference Time

Intelligence vs. Active Parameters

Artificial Analysis Intelligence Index

Active Parameters at Inference Time

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Intelligence vs. Total Parameters

Artificial Analysis Intelligence Index

Total Parameters

Context Length (Tokens), Median By Quarter

Context Window for RAG

Context window

Training Analysis

Training Tokens By Model

Training Tokens

Intelligence vs. Training Tokens

Artificial Analysis Intelligence Index

Training Tokens