Menu

logo
Artificial Analysis
HOME

LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.

For comparison of API Providers hosting the models see

HIGHLIGHTS

Quality:o1-preview logo o1-preview and o1-mini logo o1-mini are the highest quality models, followed by Claude 3.5 Sonnet (Oct) logo Claude 3.5 Sonnet (Oct) & Gemini 1.5 Pro (Sep) logo Gemini 1.5 Pro (Sep).Output Speed (tokens/s):Llama 3.2 1B logo Llama 3.2 1B (554 t/s) and Gemini 1.5 Flash (May) logo Gemini 1.5 Flash (May) (311 t/s) are the fastest models, followed by Gemini 1.5 Flash-8B logo Gemini 1.5 Flash-8B & Ministral 3B logo Ministral 3B.Latency (seconds):Aya Expanse 32B logo Aya Expanse 32B (0.22s) and  Aya Expanse 8B logo Aya Expanse 8B (0.26s) are the lowest latency models, followed by Llama 3.2 11B (Vision) logo Llama 3.2 11B (Vision) & Gemini 1.5 Flash (May) logo Gemini 1.5 Flash (May).Price ($ per M tokens):Ministral 3B logo Ministral 3B ($0.04) and Llama 3.2 1B logo Llama 3.2 1B ($0.05) are the cheapest models, followed by OpenChat 3.5 logo OpenChat 3.5 & Gemini 1.5 Flash-8B logo Gemini 1.5 Flash-8B.Context Window:Gemini 1.5 Pro (Sep) logo Gemini 1.5 Pro (Sep) (2m) and Gemini 1.5 Pro (May) logo Gemini 1.5 Pro (May) (2m) are the largest context window models, followed by Gemini 1.5 Flash-8B logo Gemini 1.5 Flash-8B & Gemini 1.5 Flash (Sep) logo Gemini 1.5 Flash (Sep).
Features
Quality
Price
Output tokens/s
Latency
Further
Analysis
o1-preview
OpenAI logo
128k
85
$26.25
36.9
27.96
o1-mini
OpenAI logo
128k
82
$5.25
77.1
13.11
GPT-4o (Aug '24)
OpenAI logo
128k
77
$4.38
87.1
0.48
GPT-4o (May '24)
OpenAI logo
128k
77
$7.50
95.0
0.50
GPT-4o mini
OpenAI logo
128k
71
$0.26
101.1
0.52
Llama 3.1 405B
Meta logo
128k
72
$5.13
82.0
0.87
Llama 3.2 90B (Vision)
Meta logo
128k
67
$0.90
41.1
0.36
Llama 3.1 70B
Meta logo
128k
65
$0.75
71.3
0.43
Llama 3.2 11B (Vision)
Meta logo
128k
53
$0.18
128.4
0.29
Llama 3.1 8B
Meta logo
128k
53
$0.11
156.5
0.37
Llama 3.2 3B
Meta logo
128k
47
$0.08
201.7
0.34
Llama 3.2 1B
Meta logo
128k
27
$0.05
553.6
0.34
Gemini 1.5 Pro (Sep)
Google logo
2m
80
$2.19
58.9
0.78
Gemini 1.5 Flash (Sep)
Google logo
1m
71
$0.13
189.0
0.31
Gemma 2 27B
Google logo
8k
61
$0.26
49.3
0.52
Gemma 2 9B
Google logo
8k
46
$0.13
160.4
0.36
Gemini 1.5 Pro (May)
Google logo
2m
$5.25
63.2
0.76
Gemini 1.5 Flash (May)
Google logo
1m
$0.13
311.1
0.30
Gemini 1.5 Flash-8B
Google logo
1m
$0.07
283.8
0.35
Claude 3.5 Sonnet (Oct)
Anthropic logo
200k
80
$6.00
55.8
0.86
Claude 3.5 Sonnet (June)
Anthropic logo
200k
77
$6.00
55.7
0.91
Claude 3 Opus
Anthropic logo
200k
70
$30.00
27.3
2.03
Claude 3.5 Haiku
Anthropic logo
200k
69
$2.00
63.5
0.88
Claude 3 Haiku
Anthropic logo
200k
54
$0.50
127.7
0.48
Mistral Large (Nov '24)
Mistral logo
128k
74
$3.00
34.6
0.49
Mistral Large 2 (Jul '24)
Mistral logo
128k
73
$3.00
34.7
0.45
Pixtral Large
Mistral logo
128k
73
$3.00
34.0
0.58
Mixtral 8x22B
Mistral logo
65k
61
$1.20
78.9
0.59
Mistral Small (Sep '24)
Mistral logo
128k
60
$0.30
55.7
0.47
Pixtral 12B
Mistral logo
128k
56
$0.13
68.9
0.49
Ministral 8B
Mistral logo
128k
53
$0.10
136.0
0.42
Mistral NeMo
Mistral logo
128k
53
$0.13
71.5
0.50
Ministral 3B
Mistral logo
128k
51
$0.04
210.5
0.41
Mixtral 8x7B
Mistral logo
33k
42
$0.50
90.1
0.34
Codestral-Mamba
Mistral logo
256k
36
$0.25
94.8
0.58
Command-R+
Cohere logo
128k
56
$5.19
49.9
0.48
Command-R
Cohere logo
128k
51
$0.51
110.0
0.33
Command-R+ (Apr '24)
Cohere logo
128k
46
$6.00
46.4
0.52
Command-R (Mar '24)
Cohere logo
128k
36
$0.75
108.4
0.35
Aya Expanse 32B
Cohere logo
8k
$0.75
121.4
0.22
Aya Expanse 8B
Cohere logo
8k
$0.75
137.7
0.26
Sonar 3.1 Large
Perplexity logo
131k
$1.00
57.9
0.36
Sonar 3.1 Small
Perplexity logo
131k
$0.20
143.3
0.35
Grok Beta
xAI logo
8k
70
$7.50
57.2
0.48
Phi-3 Medium 14B
Microsoft Azure logo
128k
$0.30
45.0
0.44
Solar Pro
Upstage logo
4k
61
$0.25
51.6
1.21
Solar Mini
Upstage logo
4k
48
$0.15
84.9
1.13
DBRX
Databricks logo
33k
49
$1.16
82.9
0.35
Llama 3.1 Nemotron 70B
NVIDIA logo
128k
70
$0.36
30.3
0.58
Reka Flash
Reka AI logo
128k
58
$0.35
32.8
1.23
Reka Core
Reka AI logo
128k
57
$2.00
14.9
1.13
Reka Flash (Feb '24)
Reka AI logo
128k
46
$0.35
31.2
0.89
Reka Edge
Reka AI logo
64k
30
$0.10
35.2
0.92
Jamba 1.5 Large
AI21 Labs logo
256k
64
$3.50
51.0
0.70
Jamba 1.5 Mini
AI21 Labs logo
256k
46
$0.25
82.5
0.49
DeepSeek-Coder-V2
DeepSeek logo
128k
67
$0.17
16.4
1.06
DeepSeek-V2
DeepSeek logo
128k
66
$0.17
16.4
1.07
DeepSeek-V2.5
DeepSeek logo
128k
66
$1.09
13.9
0.94
Qwen2.5 72B
Alibaba logo
131k
75
$0.39
47.3
0.56
Qwen2.5 Coder 32B
Alibaba logo
131k
70
$0.50
53.7
0.37
Qwen2 72B
Alibaba logo
128k
69
$0.63
55.1
0.41
Yi-Large
01.AI logo
32k
58
$3.00
67.1
0.44
GPT-4 Turbo
OpenAI logo
128k
74
$15.00
37.3
0.74
GPT-3.5 Turbo
OpenAI logo
16k
52
$0.75
103.9
0.44
GPT-3.5 Turbo Instruct
OpenAI logo
4k
$1.63
108.4
0.66
GPT-4
OpenAI logo
8k
$37.50
23.8
0.65
Llama 3 70B
Meta logo
8k
62
$0.89
45.2
0.43
Llama 3 8B
Meta logo
8k
46
$0.15
122.1
0.32
Llama 2 Chat 13B
Meta logo
4k
25
$0.56
53.1
0.48
Llama 2 Chat 7B
Meta logo
4k
$0.33
123.8
0.33
Gemini 1.0 Pro
Google logo
33k
$0.75
102.5
1.26
Claude 3 Sonnet
Anthropic logo
200k
57
$6.00
62.4
0.87
Mistral Large (Feb '24)
Mistral logo
33k
56
$6.00
35.9
0.48
Mistral Small (Feb '24)
Mistral logo
33k
50
$1.50
53.6
0.44
Mistral 7B
Mistral logo
33k
24
$0.18
96.2
0.32
Codestral
Mistral logo
33k
$0.30
80.9
0.43
Mistral Medium
Mistral logo
33k
$4.09
44.5
0.45
OpenChat 3.5
OpenChat logo
8k
43
$0.06
74.8
0.32
Jamba Instruct
AI21 Labs logo
256k
28
$0.55
75.5
0.53

Key definitions

Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.

Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o (Aug '24), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o mini, o1-mini, and o1-preview, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, and Llama 3.2 90B (Vision), Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), and Claude Instant, Mistral: Codestral, Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large (Nov '24), Mistral Medium, Mistral NeMo, Mistral Small (Feb '24), Mistral Small (Sep '24), Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, and Sonar Small, xAI: Grok Beta and Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3 Medium 14B and Phi-3 Mini, Upstage: Solar Mini and Solar Pro, Databricks: DBRX, NVIDIA: Llama 3.1 Nemotron 70B, : Olympus Lite, Olympus Micro, and Olympus Pro, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), and Reka Flash, Other: LLaVA-v1.5-7B, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek-Coder-V2, DeepSeek-V2, and DeepSeek-V2.5, Snowflake: Arctic, Alibaba: Qwen2 72B, Qwen2.5 Coder 32B, and Qwen2.5 72B, and 01.AI: Yi-Large.