Menu

logo
Artificial Analysis
HOME

LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Quality:o1 logo o1 and o3-mini logo o3-mini are the highest quality models, followed by DeepSeek R1 logo DeepSeek R1 & o1-preview logo o1-preview.Output Speed (tokens/s):DeepSeek R1 Distill Llama 70B logo DeepSeek R1 Distill Llama 70B (1257 t/s) and Gemini 1.5 Flash (May) logo Gemini 1.5 Flash (May) (303 t/s) are the fastest models, followed by Gemini 1.5 Flash-8B logo Gemini 1.5 Flash-8B & Llama 3.2 1B logo Llama 3.2 1B.Latency (seconds):Solar Mini logo Solar Mini (0.00s) and  Reka Flash logo Reka Flash (0.00s) are the lowest latency models, followed by Reka Core logo Reka Core & Reka Flash (Feb '24) logo Reka Flash (Feb '24).Price ($ per M tokens):Gemini 2.0 Pro Experimental logo Gemini 2.0 Pro Experimental ($0.00) and Gemini 2.0 Flash (exp) logo Gemini 2.0 Flash (exp) ($0.00) are the cheapest models, followed by Gemini Experimental (Nov) logo Gemini Experimental (Nov) & Qwen1.5 Chat 110B logo Qwen1.5 Chat 110B.Context Window:MiniMax-Text-01 logo MiniMax-Text-01 (4m) and Gemini 2.0 Pro Experimental logo Gemini 2.0 Pro Experimental (2m) are the largest context window models, followed by Gemini 1.5 Pro (Sep) logo Gemini 1.5 Pro (Sep) & Gemini 1.5 Pro (May) logo Gemini 1.5 Pro (May).
Features
Quality
Price
Output tokens/s
Latency
Further
Analysis
o1
OpenAI logo
200k
90
$26.25
39.7
30.70
o3-mini
OpenAI logo
200k
89
$1.93
199.0
10.85
o1-preview
OpenAI logo
128k
86
$27.56
97.3
23.03
o1-mini
OpenAI logo
128k
84
$5.51
169.1
12.92
GPT-4o (Aug '24)
OpenAI logo
128k
78
$4.38
78.3
0.69
GPT-4o (May '24)
OpenAI logo
128k
78
$7.50
94.3
0.74
GPT-4o (Nov '24)
OpenAI logo
128k
75
$4.38
104.6
0.72
GPT-4o mini
OpenAI logo
128k
73
$0.26
104.0
0.75
Llama 3.3 70B
Meta logo
128k
74
$0.62
80.5
0.50
Llama 3.1 405B
Meta logo
128k
74
$3.50
31.6
0.72
Llama 3.2 90B (Vision)
Meta logo
128k
68
$0.90
44.0
0.35
Llama 3.1 70B
Meta logo
128k
67
$0.72
70.1
0.42
Llama 3.2 11B (Vision)
Meta logo
128k
54
$0.18
139.1
0.29
Llama 3.1 8B
Meta logo
128k
54
$0.10
179.5
0.31
Llama 3.2 3B
Meta logo
128k
49
$0.06
137.8
0.36
Llama 3.2 1B
Meta logo
128k
26
$0.04
267.0
0.34
Gemini 2.0 Pro Experimental
Google logo
2m
85
$0.00
134.8
0.71
Gemini 2.0 Flash
Google logo
1m
83
$0.17
160.0
0.39
Gemini 2.0 Flash (exp)
Google logo
1m
82
$0.00
167.0
0.42
Gemini 1.5 Pro (Sep)
Google logo
2m
80
$2.19
62.5
0.75
Gemini 2.0 Flash-Lite (Preview)
Google logo
1m
79
$0.13
259.0
0.26
Gemini 1.5 Flash (Sep)
Google logo
1m
74
$0.13
179.4
0.36
Gemini 1.5 Pro (May)
Google logo
2m
69
$2.19
66.2
0.72
Gemma 2 27B
Google logo
8k
61
$0.26
71.7
0.36
Gemma 2 9B
Google logo
8k
55
$0.12
167.6
0.37
Gemini 1.5 Flash-8B
Google logo
1m
47
$0.07
285.1
0.30
Gemini 1.5 Flash (May)
Google logo
1m
$0.13
303.5
0.28
Gemini Experimental (Nov)
Google logo
2m
$0.00
54.6
0.84
Claude 3.5 Sonnet (Oct)
Anthropic logo
200k
80
$6.00
70.2
1.02
Claude 3.5 Sonnet (June)
Anthropic logo
200k
76
$6.00
73.6
0.95
Claude 3 Opus
Anthropic logo
200k
70
$30.00
26.9
1.48
Claude 3.5 Haiku
Anthropic logo
200k
68
$1.60
65.0
0.77
Claude 3 Haiku
Anthropic logo
200k
55
$0.50
116.6
0.72
Pixtral Large
Mistral logo
128k
74
$3.00
42.7
0.37
Mistral Large 2 (Nov '24)
Mistral logo
128k
74
$3.00
42.7
0.47
Mistral Large 2 (Jul '24)
Mistral logo
128k
74
$3.00
33.8
0.47
Mistral Small 3
Mistral logo
32k
72
$0.48
95.4
0.29
Mistral Small (Sep '24)
Mistral logo
33k
61
$0.30
68.8
0.33
Mixtral 8x22B
Mistral logo
65k
61
$1.20
80.7
0.57
Pixtral 12B
Mistral logo
128k
56
$0.13
98.9
0.33
Ministral 8B
Mistral logo
128k
56
$0.10
141.4
0.27
Mistral NeMo
Mistral logo
128k
54
$0.09
78.5
0.50
Ministral 3B
Mistral logo
128k
53
$0.04
223.8
0.25
Mixtral 8x7B
Mistral logo
33k
41
$0.50
102.5
0.33
Codestral-Mamba
Mistral logo
256k
33
$0.25
94.8
0.43
Codestral (Jan '25)
Mistral logo
256k
$0.45
207.2
0.25
Command-R+
Cohere logo
128k
55
$5.19
49.6
0.46
Command-R+ (Apr '24)
Cohere logo
128k
45
$6.00
49.1
0.50
Command-R (Mar '24)
Cohere logo
128k
36
$0.75
109.2
0.35
Command-R
Cohere logo
128k
$0.51
110.9
0.32
Aya Expanse 32B
Cohere logo
128k
$0.75
121.5
0.16
Aya Expanse 8B
Cohere logo
8k
$0.75
166.7
0.17
Grok Beta
xAI logo
128k
72
$7.50
67.0
0.33
Nova Pro
Amazon logo
300k
75
$1.40
90.5
0.36
Nova Lite
Amazon logo
300k
70
$0.10
145.3
0.31
Nova Micro
Amazon logo
130k
65
$0.06
196.5
0.31
Phi-4
Microsoft Azure logo
16k
76
$0.12
64.8
0.51
Phi-3 Medium 14B
Microsoft Azure logo
128k
$0.30
49.9
0.43
Solar Mini
Upstage logo
4k
47
$0.15
DBRX
Databricks logo
33k
46
$1.16
70.1
0.42
MiniMax-Text-01
MiniMax logo
4m
76
$0.42
45.1
0.93
Llama 3.1 Nemotron 70B
NVIDIA logo
128k
72
$0.27
48.0
0.60
Tulu3 405B
Allen Institute for AI logo
128k
$6.25
176.1
0.63
Reka Flash
Reka AI logo
128k
59
$0.35
Reka Core
Reka AI logo
128k
58
$2.00
Reka Flash (Feb '24)
Reka AI logo
128k
46
$0.35
Reka Edge
Reka AI logo
128k
31
$0.10
Jamba 1.5 Large
AI21 Labs logo
256k
64
$3.50
50.9
0.65
Jamba 1.5 Mini
AI21 Labs logo
256k
$0.25
154.0
0.44
DeepSeek R1
DeepSeek logo
128k
89
$3.00
27.1
60.43
DeepSeek R1 Distill Llama 70B
DeepSeek logo
128k
85
$0.81
1,257.1
1.16
DeepSeek V3
DeepSeek logo
128k
79
$0.89
11.7
1.02
DeepSeek-V2.5 (Dec '24)
DeepSeek logo
128k
72
$0.17
DeepSeek-Coder-V2
DeepSeek logo
128k
71
$0.17
DeepSeek LLM 67B (V1)
DeepSeek logo
4k
47
$0.90
28.0
0.53
DeepSeek-V2
DeepSeek logo
128k
$0.17
DeepSeek-V2.5
DeepSeek logo
128k
$1.09
7.5
0.78
Qwen2.5 Max
Alibaba logo
32k
79
$2.80
32.6
1.35
Qwen2.5 72B
Alibaba logo
131k
77
$0.40
61.4
0.56
Qwen2.5 Coder 32B
Alibaba logo
131k
72
$0.80
83.3
0.36
Qwen Turbo
Alibaba logo
1m
71
$0.09
86.4
1.19
Qwen2 72B
Alibaba logo
131k
68
$0.90
65.2
0.35
Qwen1.5 Chat 110B
Alibaba logo
32k
$0.00
QwQ 32B-Preview
Alibaba logo
33k
$0.23
79.5
0.48
Qwen2.5 Instruct 32B
Alibaba logo
128k
$0.00
61.5
0.59
Yi-Large
01.AI logo
32k
62
$3.00
66.2
0.41
GPT-4 Turbo
OpenAI logo
128k
75
$15.00
38.0
1.19
GPT-4
OpenAI logo
8k
$37.50
25.5
0.76
Llama 3 70B
Meta logo
8k
48
$0.88
48.6
0.38
Llama 3 8B
Meta logo
8k
45
$0.10
104.4
0.34
Llama 2 Chat 7B
Meta logo
4k
$0.10
124.1
0.37
Gemini 1.0 Pro
Google logo
33k
$0.75
102.5
1.23
Claude 3 Sonnet
Anthropic logo
200k
57
$6.00
56.4
0.87
Claude 2.0
Anthropic logo
100k
$12.00
29.8
0.81
Claude 2.1
Anthropic logo
200k
$12.00
27.3
1.57
Mistral Small (Feb '24)
Mistral logo
33k
59
$1.50
55.0
0.36
Mistral Large (Feb '24)
Mistral logo
33k
56
$6.00
39.2
0.41
Mistral 7B
Mistral logo
8k
28
$0.12
100.2
0.31
Mistral Medium
Mistral logo
33k
$4.09
42.8
0.35
Codestral (May '24)
Mistral logo
33k
$0.30
84.4
0.28
Sonar 3.1 Large
Perplexity logo
127k
$1.00
51.2
0.31
Sonar 3.1 Small
Perplexity logo
127k
$0.20
182.4
0.31
OpenChat 3.5
OpenChat logo
8k
44
$0.06
79.8
0.30
Jamba Instruct
AI21 Labs logo
256k
$0.55
156.3
0.45

Key definitions

Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (0314), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Turbo (1106), GPT-4 Vision, GPT-4o (Aug '24), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o Realtime (Dec '24), GPT-4o mini, GPT-4o mini Realtime (Dec '24), o1, o1-mini, o1-preview, and o3-mini, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), and Llama 3.3 70B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash, Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking exp. (Dec '24), Gemini 2.0 Flash Thinking exp. (Jan '25), Gemini 2.0 Flash-Lite (Preview), Gemini 2.0 Pro Experimental, Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), and Claude Instant, Mistral: Codestral (Jan '25), Codestral (May '24), Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large 2 (Nov '24), Mistral Medium, Mistral NeMo, Mistral Small (Feb '24), Mistral Small (Sep '24), Mistral Small 3, Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command R7B, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, Sonar Pro, Sonar Reasoning, Sonar Reasoning Pro, and Sonar Small, xAI: Grok 2, Grok Beta, and Grok-1, OpenChat: OpenChat 3.5, Amazon: Nova Lite, Nova Micro, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, and Phi-4, Upstage: Solar Mini, Solar Pro, and Solar Pro (Nov '24), Databricks: DBRX, MiniMax: MiniMax-Text-01, NVIDIA: Llama 3.1 Nemotron 70B, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), and Reka Flash, Other: LLaVA-v1.5-7B, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek LLM 67B (V1), DeepSeek R1, DeepSeek R1 Distill Llama 70B, DeepSeek R1 Distill Llama 8B, DeepSeek R1 Distill Qwen 1.5B, DeepSeek R1 Distill Qwen 14B, DeepSeek R1 Distill Qwen 32B, DeepSeek V3, DeepSeek-Coder-V2, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec '24), DeepSeek-VL2, and Janus Pro 7B, Snowflake: Arctic, Alibaba: QwQ 32B-Preview, Qwen Chat 72B, Qwen Plus, Qwen Turbo, Qwen1.5 Chat 110B, Qwen1.5 Chat 14B, Qwen1.5 Chat 32B, Qwen1.5 Chat 72B, Qwen1.5 Chat 7B, Qwen2 72B, Qwen2 Instruct 7B, Qwen2 Instruct A14B 57B, Qwen2-VL 72B, Qwen2.5 Coder 32B, Qwen2.5 Instruct 14B, Qwen2.5 Instruct 32B, Qwen2.5 72B, Qwen2.5 Instruct 7B, Qwen2.5 Max, and Qwen2.5 Max 01-29, and 01.AI: Yi-Large.