LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Quality:o1-preview logo o1-preview and o1-mini logo o1-mini are the highest quality models, followed by GPT-4o (Aug 6) logo GPT-4o (Aug 6) & Claude 3.5 Sonnet logo Claude 3.5 Sonnet.Output Speed (tokens/s):Gemma 7B logo Gemma 7B (1021 t/s) and Gemini 1.5 Flash logo Gemini 1.5 Flash (220 t/s) are the fastest models, followed by Llama 3.1 8B logo Llama 3.1 8B & GPT-4o mini logo GPT-4o mini.Latency (seconds):Reka Edge logo Reka Edge (0.00s) and  Sonar 3.1 Small  logo Sonar 3.1 Small (0.18s) are the lowest latency models, followed by Sonar Small logo Sonar Small & Sonar 3.1 Large logo Sonar 3.1 Large.Price ($ per M tokens):OpenChat 3.5 logo OpenChat 3.5 ($0.06) and Gemma 7B logo Gemma 7B ($0.07) are the cheapest models, followed by Gemini 1.5 Flash logo Gemini 1.5 Flash & Llama 3.1 8B logo Llama 3.1 8B.Context Window:Gemini 1.5 Pro logo Gemini 1.5 Pro (2m) and Gemini 1.5 Flash logo Gemini 1.5 Flash (1m) are the largest context window models, followed by Codestral-Mamba logo Codestral-Mamba & Jamba 1.5 Large logo Jamba 1.5 Large.
Parallel Queries:
Prompt Length:
Features
Quality
Price
Output tokens/s
Latency
Further
Analysis
o1-preview
OpenAI logo
128k
85
$26.25
19.0
52.67
o1-mini
OpenAI logo
128k
82
$5.25
82.3
12.44
GPT-4o (Aug 6)
OpenAI logo
128k
77
$4.38
104.4
0.38
GPT-4o
OpenAI logo
128k
77
$7.50
118.2
0.35
GPT-4o mini
OpenAI logo
128k
71
$0.26
152.3
0.38
Llama 3.1 405B
Meta logo
128k
72
$5.00
21.3
1.00
Llama 3.1 70B
Meta logo
128k
65
$0.88
49.3
0.62
Llama 3.1 8B
Meta logo
128k
53
$0.14
156.5
0.38
Gemini 1.5 Pro
Google logo
2m
72
$5.25
64.4
0.85
Gemini 1.5 Flash
Google logo
1m
60
$0.13
220.2
0.40
Gemma 2 27B
Google logo
8k
49
$0.80
48.1
0.46
Gemma 2 9B
Google logo
8k
47
$0.20
108.0
0.52
Claude 3.5 Sonnet
Anthropic logo
200k
77
$6.00
85.1
1.01
Claude 3 Opus
Anthropic logo
200k
70
$30.00
25.6
1.91
Claude 3 Haiku
Anthropic logo
200k
54
$0.50
137.8
0.57
Mistral Large 2
Mistral logo
128k
73
$4.50
39.2
0.51
Mixtral 8x22B
Mistral logo
65k
61
$1.20
53.2
0.47
Mistral NeMo
Mistral logo
128k
52
$0.20
133.0
0.33
Mistral Small
Mistral logo
33k
50
$1.50
49.1
0.35
Mixtral 8x7B
Mistral logo
33k
42
$0.50
68.9
0.42
Codestral-Mamba
Mistral logo
256k
36
$0.25
92.3
0.60
Command-R+ (08-2024)
Cohere logo
128k
56
$5.19
46.9
0.54
Command-R (08-2024)
Cohere logo
128k
51
$0.51
106.9
0.35
Command-R+ (04-2024)
Cohere logo
128k
46
$6.00
47.4
0.58
Command-R (03-2024)
Cohere logo
128k
36
$0.75
102.9
0.39
Sonar Large
Perplexity logo
33k
62
$1.00
35.6
0.55
Sonar Small
Perplexity logo
33k
41
$0.20
141.8
0.21
Sonar 3.1 Small
Perplexity logo
131k
$0.20
149.7
0.18
Sonar 3.1 Large
Perplexity logo
131k
$1.00
63.9
0.22
Phi-3 Medium 14B
Microsoft Azure logo
128k
$0.45
52.3
0.46
DBRX
Databricks logo
33k
50
$1.16
46.2
0.74
Reka Core
Reka AI logo
128k
57
$6.00
13.2
1.32
Reka Flash
Reka AI logo
128k
46
$1.10
25.3
1.11
Reka Edge
Reka AI logo
64k
30
$0.55
Jamba 1.5 Large
AI21 Labs logo
256k
64
$3.50
41.4
1.47
Jamba 1.5 Mini
AI21 Labs logo
256k
46
$0.25
100.3
0.91
DeepSeek-Coder-V2
DeepSeek logo
128k
67
$0.17
17.5
1.21
DeepSeek-V2.5
DeepSeek logo
128k
66
$0.17
17.2
1.08
DeepSeek-V2
DeepSeek logo
128k
66
$0.17
17.2
1.20
Qwen2 72B
Alibaba logo
128k
69
$0.63
44.1
0.54
Yi-Large
01.AI logo
32k
58
$3.00
58.0
0.95
GPT-4 Turbo
OpenAI logo
128k
74
$15.00
36.5
0.65
GPT-3.5 Turbo
OpenAI logo
16k
53
$0.75
86.5
0.37
GPT-3.5 Turbo Instruct
OpenAI logo
4k
$1.63
104.1
0.48
GPT-4
OpenAI logo
8k
$37.50
27.7
0.67
Llama 3 70B
Meta logo
8k
62
$0.90
53.4
0.53
Llama 3 8B
Meta logo
8k
46
$0.15
97.6
0.38
Llama 2 Chat 70B
Meta logo
4k
34
$1.39
35.1
0.41
Llama 2 Chat 13B
Meta logo
4k
25
$0.30
50.6
0.57
Llama 2 Chat 7B
Meta logo
4k
10
$0.33
69.0
0.23
Gemma 7B
Google logo
8k
28
$0.07
1,021.0
3.78
Gemini 1.0 Pro
Google logo
33k
$0.75
98.1
1.13
Claude 3 Sonnet
Anthropic logo
200k
57
$6.00
57.6
1.02
Claude 2.1
Anthropic logo
200k
$12.00
29.4
1.67
Claude 2.0
Anthropic logo
100k
$12.00
30.4
1.24
Claude Instant
Anthropic logo
100k
$1.20
66.0
0.71
Mistral Large
Mistral logo
33k
56
$6.00
26.3
0.41
Mistral 7B
Mistral logo
33k
24
$0.16
99.7
0.33
Codestral
Mistral logo
33k
$1.50
50.4
0.45
Mistral Medium
Mistral logo
33k
$4.09
38.1
0.78
Command
Cohere logo
4k
26
$1.44
20.1
1.31
Command Light
Cohere logo
4k
14
$0.38
28.1
1.80
OpenChat 3.5
OpenChat logo
8k
43
$0.06
59.2
0.60
Jamba Instruct
AI21 Labs logo
256k
28
$0.55
51.4
0.69

Key definitions

Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o, GPT-4o (Aug 6), GPT-4o mini, o1-mini, and o1-preview, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, and Llama 3.1 8B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro, Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Sonnet, and Claude Instant, Mistral: Codestral, Codestral-Mamba, Mistral 7B, Mistral Large, Mistral Large 2, Mistral Medium, Mistral NeMo, Mistral Small, Mixtral 8x22B, Mixtral 8x7B, and Pixtral 12B, Cohere: Command, Command Light, Command-R (03-2024), Command-R (08-2024), Command-R+ (04-2024), and Command-R+ (08-2024), Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, and Sonar Small, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3 Medium 14B and Phi-3 Mini, Databricks: DBRX, Reka AI: Reka Core, Reka Edge, and Reka Flash, Other: LLaVA-v1.5-7B, Glaive: Reflection Llama 3.1 - 70B and Reflection Llama 3.1 70B v2, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek-Coder-V2, DeepSeek-V2, and DeepSeek-V2.5, Snowflake: Arctic, Alibaba: Qwen2 72B, and 01.AI: Yi-Large.