Menu

logo
Artificial Analysis
HOME

LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Intelligence:o3-mini (high) logo o3-mini (high) and o3-mini logo o3-mini are the highest quality models, followed by o1 logo o1 & DeepSeek R1 logo DeepSeek R1.Output Speed (tokens/s):Nova Micro logo Nova Micro (338 t/s) and DeepSeek R1 Distill Qwen 1.5B logo DeepSeek R1 Distill Qwen 1.5B (317 t/s) are the fastest models, followed by o1-mini logo o1-mini & Llama 3.2 1B logo Llama 3.2 1B.Latency (seconds):Gemini 1.5 Flash (Sep) logo Gemini 1.5 Flash (Sep) (0.05s) and  Gemini 1.5 Pro (May) logo Gemini 1.5 Pro (May) (0.07s) are the lowest latency models, followed by Gemini 1.5 Flash (May) logo Gemini 1.5 Flash (May) & Gemini 1.5 Pro (Sep) logo Gemini 1.5 Pro (Sep).Price ($ per M tokens):Qwen2.5 Coder 7B  logo Qwen2.5 Coder 7B ($0.03) and Llama 3.2 1B logo Llama 3.2 1B ($0.04) are the cheapest models, followed by Ministral 3B logo Ministral 3B & DeepSeek R1 Distill Llama 8B logo DeepSeek R1 Distill Llama 8B.Context Window:MiniMax-Text-01 logo MiniMax-Text-01 (4m) and Gemini 2.0 Pro Experimental logo Gemini 2.0 Pro Experimental (2m) are the largest context window models, followed by Gemini 1.5 Pro (Sep) logo Gemini 1.5 Pro (Sep) & Gemini 1.5 Pro (May) logo Gemini 1.5 Pro (May).
Parallel Queries:
Prompt Length:
Features
Intelligence
Price
Output tokens/s
Latency
Further
Analysis
o3-mini (high)
OpenAI
200k
66
$1.93
137.2
55.53
o3-mini
OpenAI
200k
63
$1.93
156.4
13.94
o1
OpenAI
200k
62
$26.25
DeepSeek R1
DeepSeek
128k
60
$0.96
22.4
90.57
Claude 3.7 Sonnet Thinking
Anthropic
200k
57
$6.00
77.9
0.91
o1-mini
OpenAI
128k
54
$1.93
208.1
11.09
DeepSeek R1 Distill Qwen 32B
DeepSeek
128k
51
$0.30
36.3
17.74
Gemini 2.0 Pro Experimental
Google
2m
49
$0.00
132.0
0.61
DeepSeek R1 Distill Qwen 14B
DeepSeek
128k
49
$0.88
37.8
15.60
DeepSeek R1 Distill Llama 70B
DeepSeek
128k
48
$0.81
123.2
13.04
Claude 3.7 Sonnet
Anthropic
200k
48
$6.00
77.4
1.09
Gemini 2.0 Flash
Google
1m
48
$0.17
184.3
0.37
DeepSeek V3
DeepSeek
128k
46
$0.48
25.8
20.73
Qwen2.5 Max
Alibaba
32k
45
$2.80
35.6
1.17
Gemini 1.5 Pro (Sep)
Google
2m
45
$2.19
0.0
0.09
Claude 3.5 Sonnet (Oct)
Anthropic
200k
44
$6.00
QwQ 32B-Preview
Alibaba
33k
43
$0.58
62.1
1.20
Gemini 2.0 Flash-Lite (Preview)
Google
1m
42
$0.13
191.5
0.24
GPT-4o (Nov '24)
OpenAI
128k
41
$4.38
82.8
0.49
Llama 3.3 70B
Meta
128k
41
$0.64
117.4
0.79
GPT-4o (ChatGPT)
OpenAI
128k
41
$7.50
69.7
0.55
GPT-4o (Aug '24)
OpenAI
128k
41
$4.38
43.8
0.51
GPT-4o (May '24)
OpenAI
128k
41
$7.50
47.5
0.48
Llama 3.1 405B
Meta
128k
40
$3.50
18.8
1.36
Qwen2.5 72B
Alibaba
131k
40
$0.00
40.1
1.12
Phi-4
Microsoft Azure
16k
40
$0.12
Tulu3 405B
Allen Institute for AI
128k
40
$6.25
116.0
8.03
MiniMax-Text-01
MiniMax
4m
40
$0.42
32.9
1.04
Mistral Large 2 (Nov '24)
Mistral
128k
38
$3.00
32.1
0.49
Grok Beta
xAI
128k
38
$7.50
66.3
0.35
Pixtral Large
Mistral
128k
37
$3.00
31.1
0.49
Qwen2.5 Instruct 32B
Alibaba
128k
37
$0.79
Llama 3.1 Nemotron 70B
NVIDIA
128k
37
$0.27
39.7
0.80
Nova Pro
Amazon
300k
37
$1.40
Mistral Large 2 (Jul '24)
Mistral
128k
37
$3.00
29.6
0.62
Qwen2.5 Coder 32B
Alibaba
131k
36
$0.80
60.2
0.70
GPT-4o mini
OpenAI
128k
36
$0.26
116.5
0.38
Llama 3.1 70B
Meta
128k
35
$0.72
63.9
0.70
Mistral Small 3
Mistral
32k
35
$0.15
42.7
0.46
Claude 3 Opus
Anthropic
200k
35
$30.00
26.8
130.22
Claude 3.5 Haiku
Anthropic
200k
35
$1.60
64.3
1.82
DeepSeek R1 Distill Llama 8B
DeepSeek
128k
34
$0.04
47.3
13.38
Gemini 1.5 Pro (May)
Google
2m
34
$2.19
0.0
0.07
Qwen Turbo
Alibaba
1m
34
$0.09
85.0
1.09
Llama 3.2 90B (Vision)
Meta
128k
33
$0.90
33.2
0.59
Qwen2 72B
Alibaba
131k
33
$0.00
Mistral Saba
Mistral
32k
32
$0.30
42.3
0.43
Jamba 1.5 Large
AI21 Labs
256k
29
$3.50
43.1
1.03
Gemini 1.5 Flash (May)
Google
1m
28
$0.13
0.0
0.07
Nova Micro
Amazon
130k
28
$0.06
337.9
0.48
Yi-Large
01.AI
32k
28
$3.00
58.1
1.27
Claude 3 Sonnet
Anthropic
200k
28
$6.00
53.6
0.51
Codestral (Jan '25)
Mistral
256k
28
$0.45
41.7
0.44
Llama 3 70B
Meta
8k
27
$0.88
54.3
0.77
Mistral Small (Sep '24)
Mistral
33k
27
$0.30
36.2
0.45
Mistral Large (Feb '24)
Mistral
33k
26
$6.00
33.8
0.76
Mixtral 8x22B
Mistral
65k
26
$3.00
34.7
0.42
Qwen2.5 Coder 7B
Alibaba
131k
26
$0.03
169.8
0.62
Phi-3 Medium 14B
Microsoft Azure
128k
25
$0.30
46.9
0.85
Claude 2.1
Anthropic
200k
24
$12.00
DeepSeek Coder V2 Lite
DeepSeek
128k
24
$0.09
55.8
0.96
Mistral Medium
Mistral
33k
24
$4.09
33.5
0.58
Llama 3.1 8B
Meta
128k
24
$0.10
170.3
0.36
Pixtral 12B
Mistral
128k
23
$0.15
38.7
0.43
Mistral Small (Feb '24)
Mistral
33k
23
$1.50
42.8
0.40
Ministral 8B
Mistral
128k
22
$0.10
44.8
0.40
Llama 3.2 11B (Vision)
Meta
128k
22
$0.17
73.7
0.39
Command-R+
Cohere
128k
21
$4.38
58.4
0.40
Llama 3 8B
Meta
8k
21
$0.10
89.0
0.57
Codestral (May '24)
Mistral
33k
20
$0.30
33.2
0.42
Aya Expanse 32B
Cohere
128k
20
$0.75
119.2
0.21
Command-R+ (Apr '24)
Cohere
128k
20
$6.00
71.8
0.26
DBRX
Databricks
33k
20
$1.13
41.1
0.67
Ministral 3B
Mistral
128k
20
$0.04
42.4
0.37
Mistral NeMo
Mistral
128k
20
$0.15
41.8
0.42
Llama 3.2 3B
Meta
128k
20
$0.06
115.6
0.69
DeepSeek R1 Distill Qwen 1.5B
DeepSeek
128k
19
$0.18
316.9
7.55
Mixtral 8x7B
Mistral
33k
17
$0.70
32.6
0.41
OpenChat 3.5
OpenChat
8k
16
$0.06
64.1
0.71
Jamba Instruct
AI21 Labs
256k
16
$0.55
142.6
0.37
Command-R
Cohere
128k
15
$0.26
67.2
0.32
Command-R (Mar '24)
Cohere
128k
15
$0.75
170.3
0.17
Codestral-Mamba
Mistral
256k
14
$0.25
34.4
0.60
Mistral 7B
Mistral
8k
10
$0.25
33.0
0.37
Llama 3.2 1B
Meta
128k
10
$0.04
194.2
0.50
Llama 2 Chat 7B
Meta
4k
8
$0.10
69.4
1.21
o1-preview
OpenAI
128k
$26.25
117.8
28.59
GPT-4.5 (Preview)
OpenAI
128k
$93.75
50.3
1.63
o3
OpenAI
128k
$0.00
Gemini 2.0 Flash (exp)
Google
1m
$0.00
175.2
0.29
Gemini 1.5 Flash (Sep)
Google
1m
$0.13
0.0
0.05
Gemma 2 27B
Google
8k
$0.26
Gemma 2 9B
Google
8k
$0.12
Gemini 1.5 Flash-8B
Google
1m
$0.07
Gemini Experimental (Nov)
Google
2m
$0.00
Claude 3.5 Sonnet (June)
Anthropic
200k
$6.00
Claude 3 Haiku
Anthropic
200k
$0.50
140.3
0.57
DeepSeek-V2.5 (Dec '24)
DeepSeek
128k
$0.17
DeepSeek-Coder-V2
DeepSeek
128k
$0.17
DeepSeek LLM 67B (V1)
DeepSeek
4k
$0.90
DeepSeek-V2.5
DeepSeek
128k
$1.09
DeepSeek-V2
DeepSeek
128k
$0.17
Sonar Pro
Perplexity
200k
$6.00
Sonar
Perplexity
127k
$1.00
Sonar Reasoning
Perplexity
127k
$2.00
Grok 3 mini
xAI
128k
$0.00
Grok 3 Reasoning Beta
xAI
128k
$0.00
Grok 3 mini Reasoning
xAI
128k
$0.00
Grok 3
xAI
128k
$0.00
Nova Lite
Amazon
300k
$0.10
Solar Mini
Upstage
4k
$0.15
Reka Flash
Reka AI
128k
$0.35
Reka Core
Reka AI
128k
$2.00
Reka Flash (Feb '24)
Reka AI
128k
$0.35
Reka Edge
Reka AI
128k
$0.10
Aya Expanse 8B
Cohere
8k
$0.75
147.2
0.22
Jamba 1.5 Mini
AI21 Labs
256k
$0.25
149.1
0.41
Qwen Chat 72B
Alibaba
34k
$1.00
Qwen1.5 Chat 110B
Alibaba
32k
$0.00
GPT-4 Turbo
OpenAI
128k
$15.00
GPT-4
OpenAI
8k
$37.50
Gemini 1.0 Pro
Google
33k
$0.75
Claude 2.0
Anthropic
100k
$12.00
Sonar 3.1 Small
Perplexity
127k
$0.20
Sonar 3.1 Large
Perplexity
127k
$1.00

Key definitions

Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (0314), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Turbo (1106), GPT-4 Vision, GPT-4.5 (Preview), GPT-4o (Aug '24), GPT-4o (ChatGPT), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o Realtime (Dec '24), GPT-4o mini, GPT-4o mini Realtime (Dec '24), o1, o1-mini, o1-preview, o3, o3-mini, and o3-mini (high), Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), and Llama 3.3 70B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash, Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking exp. (Dec '24), Gemini 2.0 Flash Thinking exp. (Jan '25), Gemini 2.0 Flash-Lite (Feb '25), Gemini 2.0 Flash-Lite (Preview), Gemini 2.0 Pro Experimental, Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), Claude 3.7 Sonnet Thinking, Claude 3.7 Sonnet, and Claude Instant, Mistral: Codestral (Jan '25), Codestral (May '24), Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large 2 (Nov '24), Mistral Medium, Mistral NeMo, Mistral Saba, Mistral Small (Feb '24), Mistral Small (Sep '24), Mistral Small 3, Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, DeepSeek: DeepSeek Coder V2 Lite, DeepSeek LLM 67B (V1), DeepSeek R1, DeepSeek R1 Distill Llama 70B, DeepSeek R1 Distill Llama 8B, DeepSeek R1 Distill Qwen 1.5B, DeepSeek R1 Distill Qwen 14B, DeepSeek R1 Distill Qwen 32B, DeepSeek V3, DeepSeek-Coder-V2, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec '24), DeepSeek-VL2, and Janus Pro 7B, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, Sonar Pro, Sonar Reasoning, Sonar Reasoning Pro, and Sonar Small, xAI: Grok 2, Grok 3, Grok 3 Reasoning Beta, Grok 3 mini, Grok 3 mini Reasoning, Grok Beta, and Grok-1, OpenChat: OpenChat 3.5, Amazon: Nova Lite, Nova Micro, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, Phi-4, Phi-4 Mini, and Phi-4 Multimodal, Upstage: Solar Mini, Solar Pro, and Solar Pro (Nov '24), Databricks: DBRX, MiniMax: MiniMax-Text-01, NVIDIA: Cosmos Nemotron 34B and Llama 3.1 Nemotron 70B, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Inceptionlabs: Mercury Coder Mini, Mercury Coder Small, and Mercury Instruct, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), Reka Flash (Feb '25), and Reka Flash, Other: LLaVA-v1.5-7B, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command R7B, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Large (Feb '25), Jamba 1.5 Mini, Jamba 1.5 Mini (Feb 2025), Jamba 1.6 Large, Jamba 1.6 Mini, and Jamba Instruct, Snowflake: Arctic, Alibaba: QwQ 32B-Preview, Qwen Chat 72B, Qwen Plus, Qwen Turbo, Qwen1.5 Chat 110B, Qwen1.5 Chat 14B, Qwen1.5 Chat 32B, Qwen1.5 Chat 72B, Qwen1.5 Chat 7B, Qwen2 72B, Qwen2 Instruct 7B, Qwen2 Instruct A14B 57B, Qwen2-VL 72B, Qwen2.5 Coder 32B, Qwen2.5 Coder 7B , Qwen2.5 Instruct 14B, Qwen2.5 Instruct 32B, Qwen2.5 72B, Qwen2.5 Instruct 7B, Qwen2.5 Max, and Qwen2.5 Max 01-29, and 01.AI: Yi-Large.