LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Quality:GPT-4o logo GPT-4o  and Claude 3.5 Sonnet logo Claude 3.5 Sonnet  are the highest quality models, followed by GPT-4 Turbo logo GPT-4 Turbo & Claude 3 Opus logo Claude 3 Opus.Output Speed (tokens/s):Gemma 7B logo Gemma 7B (207 t/s) and Gemini 1.5 Flash logo Gemini 1.5 Flash (139 t/s) are the fastest models, followed by Command-R logo Command-R & Llama 3 (8B) logo Llama 3 (8B).Latency (seconds):Mixtral 8x22B logo Mixtral 8x22B (0.28s) and  Mistral 7B logo Mistral 7B (0.28s) are the lowest latency models, followed by Mixtral 8x7B logo Mixtral 8x7B & Gemma 7B logo Gemma 7B.Price ($ per M tokens):OpenChat 3.5 logo OpenChat 3.5 ($0.14) and Gemma 7B logo Gemma 7B ($0.15) are the cheapest models, followed by DeepSeek-V2 logo DeepSeek-V2 & Llama 3 (8B) logo Llama 3 (8B).Context Window:Gemini 1.5 Flash logo Gemini 1.5 Flash (1m) and Gemini 1.5 Pro logo Gemini 1.5 Pro (1m) are the largest context window models, followed by Jamba Instruct logo Jamba Instruct & Claude 3.5 Sonnet logo Claude 3.5 Sonnet.
Context
Quality
Price
Output tokens/s
Latency
Further
Analysis
GPT-4o
OpenAI logo
128k
100
$7.50
72.1
0.49
GPT-4 Turbo
OpenAI logo
128k
94
$15.00
27.4
0.62
GPT-4
OpenAI logo
8k
93
$37.50
20.6
0.65
GPT-3.5 Turbo
OpenAI logo
16k
65
$0.75
62.8
0.36
GPT-3.5 Turbo Instruct
OpenAI logo
4k
60
$1.63
107.7
0.53
Gemini 1.5 Flash
Google logo
1m
83
$0.53
139.5
1.34
Gemini 1.5 Pro
Google logo
1m
93
$5.25
63.4
1.49
Gemini 1.0 Pro
Google logo
33k
62
$0.75
85.9
2.43
Gemma 7B
Google logo
8k
57
$0.15
207.3
0.30
Llama 3 (70B)
Meta logo
8k
88
$0.90
53.2
0.44
Llama 3 (8B)
Meta logo
8k
65
$0.20
121.1
0.31
Code Llama (70B)
Meta logo
16k
58
$0.90
30.8
0.49
Llama 2 Chat (70B)
Meta logo
4k
50
$1.00
52.0
0.51
Jamba Instruct
AI21 Labs logo
256k
63
$0.55
66.5
0.44
Llama 2 Chat (13B)
Meta logo
4k
36
$0.25
53.9
0.37
Llama 2 Chat (7B)
Meta logo
4k
27
$0.20
89.7
0.50
Mixtral 8x22B
Mistral logo
65k
78
$1.20
67.9
0.28
Mistral Large
Mistral logo
33k
75
$6.00
33.4
0.46
Mistral Medium
Mistral logo
33k
73
$4.05
37.4
0.34
Mistral Small
Mistral logo
33k
71
$1.50
38.2
0.72
Mixtral 8x7B
Mistral logo
33k
65
$0.50
89.6
0.30
Mistral 7B
Mistral logo
33k
39
$0.20
80.4
0.28
Claude 3.5 Sonnet
Anthropic logo
200k
100
$6.00
79.4
0.86
Claude 3 Opus
Anthropic logo
200k
94
$30.00
23.4
1.92
Claude 3 Sonnet
Anthropic logo
200k
78
$6.00
54.5
0.96
Claude 3 Haiku
Anthropic logo
200k
72
$0.50
117.0
0.54
Qwen2 (72B)
Alibaba logo
128k
$0.90
44.4
0.51
Claude 2.0
Anthropic logo
100k
69
$12.00
38.4
1.28
Claude 2.1
Anthropic logo
200k
63
$12.00
36.0
1.52
Claude Instant
Anthropic logo
100k
63
$1.20
88.0
0.57
Command Light
Cohere logo
4k
$0.38
37.4
0.43
Command
Cohere logo
4k
$1.44
23.8
0.57
Command-R+
Cohere logo
128k
74
$6.00
59.8
0.44
Command-R
Cohere logo
128k
62
$0.75
127.5
0.34
OpenChat 3.5
OpenChat logo
8k
54
$0.14
69.7
0.33
DBRX
Databricks logo
33k
74
$1.20
69.3
0.49
DeepSeek-V2
DeepSeek logo
128k
82
$0.17
16.9
1.63
Arctic
Snowflake logo
4k
63
$2.40
72.6
0.41

Key definitions

Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, and GPT-4o, Google: Gemini 1.0 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemma 7B, Meta: Code Llama (70B), Llama 2 Chat (13B), Llama 2 Chat (70B), Llama 2 Chat (7B), Llama 3 (70B), and Llama 3 (8B), Mistral: Mistral 7B, Mistral Large, Mistral Medium, Mistral Small, Mixtral 8x22B, and Mixtral 8x7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Sonnet, and Claude Instant, Cohere: Command, Command Light, Command-R, and Command-R+, Perplexity: PPLX-70B Online and PPLX-7B-Online, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3 medium and Phi-3 Mini, Databricks: DBRX, Reka AI: Reka Core, Reka Edge, and Reka Flash, AI21 Labs: Jamba Instruct, DeepSeek: DeepSeek-V2, Snowflake: Arctic, and Alibaba: Qwen2 (72B).