Models Leaderboard

Comparison and ranking the performance of AI LLM models across key metrics including quality, price, performance and speed (throughput & latency), context window & others. For more details including relating to our methodology, see our FAQs.

HIGHLIGHTS

Highest Quality:

#1Claude 3 Opus
#2GPT-4 Vision
#3GPT-4 Turbo

Highest Throughput (fastest):

#1Llama 3 (8B)
#2Gemma 7B
#3GPT-3.5 Turbo Instruct

Lowest Latency:

#1Command-R
#2Command-R+
#3Mistral Medium

Largest Context Window:

#1Gemini 1.5 Pro
#2Claude 3 Opus
Claude 3 Sonnet
Claude 3 Haiku
Claude 2.1
#3GPT-4 Turbo
GPT-4 Vision
Command-R+
Command-R
Context
Quality
Price
Throughput
Latency
Further
Analysis
GPT-4
OpenAI logo
8k
90
$37.50
19.2
0.56
GPT-4 Turbo
OpenAI logo
128k
100
$15.00
17.4
0.64
GPT-4 Vision
OpenAI logo
128k
100
$15.00
33.0
0.57
GPT-3.5 Turbo
OpenAI logo
16k
67
$0.75
55.5
0.33
GPT-3.5 Turbo Instruct
OpenAI logo
4k
60
$1.63
111.6
0.51
Llama 3 (70B)
Meta logo
8k
88
$0.90
45.8
0.32
Llama 2 Chat (13B)
Meta logo
4k
37
$0.25
50.6
0.34
Llama 2 Chat (70B)
Meta logo
4k
56
$1.00
44.6
0.44
Llama 3 (8B)
Meta logo
8k
58
$0.14
224.5
0.28
Llama 2 Chat (7B)
Meta logo
4k
27
$0.20
89.9
0.61
Code Llama (70B)
Meta logo
16k
58
$0.90
31.6
0.30
Mistral Large
Mistral logo
33k
84
$12.00
26.9
0.31
Mistral Medium
Mistral logo
33k
76
$4.05
21.6
0.20
Mixtral 8x22B
Mistral logo
65k
83
$1.20
59.8
0.26
Mixtral 8x7B
Mistral logo
33k
68
$0.50
102.6
0.28
Mistral Small
Mistral logo
33k
73
$3.00
55.6
0.21
Mistral 7B
Mistral logo
33k
40
$0.20
81.9
0.25
Gemini 1.5 Pro
Google logo
1000k
88
$10.50
43.2
1.27
Gemini 1.0 Pro
Google logo
33k
66
$0.75
77.8
1.45
Gemma 7B
Google logo
8k
59
$0.15
164.9
0.28
Claude 3 Opus
Anthropic logo
200k
100
$30.00
26.4
1.09
Claude 3 Sonnet
Anthropic logo
200k
85
$6.00
62.7
0.60
Claude 3 Haiku
Anthropic logo
200k
78
$0.50
94.0
0.39
Claude 2.1
Anthropic logo
200k
66
$12.00
42.5
0.48
Claude 2.0
Anthropic logo
100k
72
$12.00
39.6
0.46
Claude Instant
Anthropic logo
100k
65
$1.20
86.7
0.41
Command-R+
Cohere logo
128k
80
$6.00
40.3
0.16
Command-R
Cohere logo
128k
67
$0.75
111.4
0.16
Command
Cohere logo
4k
$1.44
28.4
0.34
Command Light
Cohere logo
4k
$0.38
52.9
0.25
DBRX
Databricks logo
33k
76
$1.40
79.7
0.48
OpenChat 3.5
OpenChat logo
8k
56
$0.17
70.7
0.67
PPLX-70B Online
Perplexity logo
4k
45
$1.00
38.3
1.17
PPLX-7B-Online
Perplexity logo
4k
35
$0.20
95.1
0.96

Key definitions

Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), and GPT-4 Vision, Google: Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemma 7B, Meta: Code Llama (70B), Llama 2 Chat (13B), Llama 2 Chat (70B), Llama 2 Chat (7B), Llama 3 (70B), and Llama 3 (8B), Mistral: Mistral 7B, Mistral Large, Mistral Medium, Mistral Small, Mixtral 8x22B, and Mixtral 8x7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, and Claude Instant, Cohere: Command, Command Light, Command-R, and Command-R+, Perplexity: PPLX-70B Online and PPLX-7B-Online, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3-mini, and Databricks: DBRX.