LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS






Features | Intelligence | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
o3-mini (high) | 200k | 66 | $1.93 | 137.2 | 55.53 | ||
o3-mini | 200k | 63 | $1.93 | 156.4 | 13.94 | ||
o1 | 200k | 62 | $26.25 | ||||
DeepSeek R1 | ![]() | 128k | 60 | $0.96 | 22.4 | 90.57 | |
Claude 3.7 Sonnet Thinking | 200k | 57 | $6.00 | 77.9 | 0.91 | ||
o1-mini | 128k | 54 | $1.93 | 208.1 | 11.09 | ||
DeepSeek R1 Distill Qwen 32B | ![]() | 128k | 51 | $0.30 | 36.3 | 17.74 | |
Gemini 2.0 Pro Experimental | 2m | 49 | $0.00 | 132.0 | 0.61 | ||
DeepSeek R1 Distill Qwen 14B | ![]() | 128k | 49 | $0.88 | 37.8 | 15.60 | |
DeepSeek R1 Distill Llama 70B | ![]() | 128k | 48 | $0.81 | 123.2 | 13.04 | |
Claude 3.7 Sonnet | 200k | 48 | $6.00 | 77.4 | 1.09 | ||
Gemini 2.0 Flash | 1m | 48 | $0.17 | 184.3 | 0.37 | ||
DeepSeek V3 | ![]() | 128k | 46 | $0.48 | 25.8 | 20.73 | |
Qwen2.5 Max | 32k | 45 | $2.80 | 35.6 | 1.17 | ||
Gemini 1.5 Pro (Sep) | 2m | 45 | $2.19 | 0.0 | 0.09 | ||
Claude 3.5 Sonnet (Oct) | 200k | 44 | $6.00 | ||||
QwQ 32B-Preview | 33k | 43 | $0.58 | 62.1 | 1.20 | ||
Gemini 2.0 Flash-Lite (Preview) | 1m | 42 | $0.13 | 191.5 | 0.24 | ||
GPT-4o (Nov '24) | 128k | 41 | $4.38 | 82.8 | 0.49 | ||
Llama 3.3 70B | 128k | 41 | $0.64 | 117.4 | 0.79 | ||
GPT-4o (ChatGPT) | 128k | 41 | $7.50 | 69.7 | 0.55 | ||
GPT-4o (Aug '24) | 128k | 41 | $4.38 | 43.8 | 0.51 | ||
GPT-4o (May '24) | 128k | 41 | $7.50 | 47.5 | 0.48 | ||
Llama 3.1 405B | 128k | 40 | $3.50 | 18.8 | 1.36 | ||
Qwen2.5 72B | 131k | 40 | $0.00 | 40.1 | 1.12 | ||
Phi-4 | ![]() | 16k | 40 | $0.12 | |||
Tulu3 405B | 128k | 40 | $6.25 | 116.0 | 8.03 | ||
MiniMax-Text-01 | ![]() | 4m | 40 | $0.42 | 32.9 | 1.04 | |
Mistral Large 2 (Nov '24) | ![]() | 128k | 38 | $3.00 | 32.1 | 0.49 | |
Grok Beta | 128k | 38 | $7.50 | 66.3 | 0.35 | ||
Pixtral Large | ![]() | 128k | 37 | $3.00 | 31.1 | 0.49 | |
Qwen2.5 Instruct 32B | 128k | 37 | $0.79 | ||||
Llama 3.1 Nemotron 70B | 128k | 37 | $0.27 | 39.7 | 0.80 | ||
Nova Pro | ![]() | 300k | 37 | $1.40 | |||
Mistral Large 2 (Jul '24) | ![]() | 128k | 37 | $3.00 | 29.6 | 0.62 | |
Qwen2.5 Coder 32B | 131k | 36 | $0.80 | 60.2 | 0.70 | ||
GPT-4o mini | 128k | 36 | $0.26 | 116.5 | 0.38 | ||
Llama 3.1 70B | 128k | 35 | $0.72 | 63.9 | 0.70 | ||
Mistral Small 3 | ![]() | 32k | 35 | $0.15 | 42.7 | 0.46 | |
Claude 3 Opus | 200k | 35 | $30.00 | 26.8 | 130.22 | ||
Claude 3.5 Haiku | 200k | 35 | $1.60 | 64.3 | 1.82 | ||
DeepSeek R1 Distill Llama 8B | ![]() | 128k | 34 | $0.04 | 47.3 | 13.38 | |
Gemini 1.5 Pro (May) | 2m | 34 | $2.19 | 0.0 | 0.07 | ||
Qwen Turbo | 1m | 34 | $0.09 | 85.0 | 1.09 | ||
Llama 3.2 90B (Vision) | 128k | 33 | $0.90 | 33.2 | 0.59 | ||
Qwen2 72B | 131k | 33 | $0.00 | ||||
Mistral Saba | ![]() | 32k | 32 | $0.30 | 42.3 | 0.43 | |
Jamba 1.5 Large | 256k | 29 | $3.50 | 43.1 | 1.03 | ||
Gemini 1.5 Flash (May) | 1m | 28 | $0.13 | 0.0 | 0.07 | ||
Nova Micro | ![]() | 130k | 28 | $0.06 | 337.9 | 0.48 | |
Yi-Large | ![]() | 32k | 28 | $3.00 | 58.1 | 1.27 | |
Claude 3 Sonnet | 200k | 28 | $6.00 | 53.6 | 0.51 | ||
Codestral (Jan '25) | ![]() | 256k | 28 | $0.45 | 41.7 | 0.44 | |
Llama 3 70B | 8k | 27 | $0.88 | 54.3 | 0.77 | ||
Mistral Small (Sep '24) | ![]() | 33k | 27 | $0.30 | 36.2 | 0.45 | |
Mistral Large (Feb '24) | ![]() | 33k | 26 | $6.00 | 33.8 | 0.76 | |
Mixtral 8x22B | ![]() | 65k | 26 | $3.00 | 34.7 | 0.42 | |
Qwen2.5 Coder 7B | 131k | 26 | $0.03 | 169.8 | 0.62 | ||
Phi-3 Medium 14B | ![]() | 128k | 25 | $0.30 | 46.9 | 0.85 | |
Claude 2.1 | 200k | 24 | $12.00 | ||||
DeepSeek Coder V2 Lite | ![]() | 128k | 24 | $0.09 | 55.8 | 0.96 | |
Mistral Medium | ![]() | 33k | 24 | $4.09 | 33.5 | 0.58 | |
Llama 3.1 8B | 128k | 24 | $0.10 | 170.3 | 0.36 | ||
Pixtral 12B | ![]() | 128k | 23 | $0.15 | 38.7 | 0.43 | |
Mistral Small (Feb '24) | ![]() | 33k | 23 | $1.50 | 42.8 | 0.40 | |
Ministral 8B | ![]() | 128k | 22 | $0.10 | 44.8 | 0.40 | |
Llama 3.2 11B (Vision) | 128k | 22 | $0.17 | 73.7 | 0.39 | ||
Command-R+ | 128k | 21 | $4.38 | 58.4 | 0.40 | ||
Llama 3 8B | 8k | 21 | $0.10 | 89.0 | 0.57 | ||
Codestral (May '24) | ![]() | 33k | 20 | $0.30 | 33.2 | 0.42 | |
Aya Expanse 32B | 128k | 20 | $0.75 | 119.2 | 0.21 | ||
Command-R+ (Apr '24) | 128k | 20 | $6.00 | 71.8 | 0.26 | ||
DBRX | 33k | 20 | $1.13 | 41.1 | 0.67 | ||
Ministral 3B | ![]() | 128k | 20 | $0.04 | 42.4 | 0.37 | |
Mistral NeMo | ![]() | 128k | 20 | $0.15 | 41.8 | 0.42 | |
Llama 3.2 3B | 128k | 20 | $0.06 | 115.6 | 0.69 | ||
DeepSeek R1 Distill Qwen 1.5B | ![]() | 128k | 19 | $0.18 | 316.9 | 7.55 | |
Mixtral 8x7B | ![]() | 33k | 17 | $0.70 | 32.6 | 0.41 | |
OpenChat 3.5 | ![]() | 8k | 16 | $0.06 | 64.1 | 0.71 | |
Jamba Instruct | 256k | 16 | $0.55 | 142.6 | 0.37 | ||
Command-R | 128k | 15 | $0.26 | 67.2 | 0.32 | ||
Command-R (Mar '24) | 128k | 15 | $0.75 | 170.3 | 0.17 | ||
Codestral-Mamba | ![]() | 256k | 14 | $0.25 | 34.4 | 0.60 | |
Mistral 7B | ![]() | 8k | 10 | $0.25 | 33.0 | 0.37 | |
Llama 3.2 1B | 128k | 10 | $0.04 | 194.2 | 0.50 | ||
Llama 2 Chat 7B | 4k | 8 | $0.10 | 69.4 | 1.21 | ||
o1-preview | 128k | $26.25 | 117.8 | 28.59 | |||
GPT-4.5 (Preview) | 128k | $93.75 | 50.3 | 1.63 | |||
o3 | 128k | $0.00 | |||||
Gemini 2.0 Flash (exp) | 1m | $0.00 | 175.2 | 0.29 | |||
Gemini 1.5 Flash (Sep) | 1m | $0.13 | 0.0 | 0.05 | |||
Gemma 2 27B | 8k | $0.26 | |||||
Gemma 2 9B | 8k | $0.12 | |||||
Gemini 1.5 Flash-8B | 1m | $0.07 | |||||
Gemini Experimental (Nov) | 2m | $0.00 | |||||
Claude 3.5 Sonnet (June) | 200k | $6.00 | |||||
Claude 3 Haiku | 200k | $0.50 | 140.3 | 0.57 | |||
DeepSeek-V2.5 (Dec '24) | ![]() | 128k | $0.17 | ||||
DeepSeek-Coder-V2 | ![]() | 128k | $0.17 | ||||
DeepSeek LLM 67B (V1) | ![]() | 4k | $0.90 | ||||
DeepSeek-V2.5 | ![]() | 128k | $1.09 | ||||
DeepSeek-V2 | ![]() | 128k | $0.17 | ||||
Sonar Pro | 200k | $6.00 | |||||
Sonar | 127k | $1.00 | |||||
Sonar Reasoning | 127k | $2.00 | |||||
Grok 3 mini | 128k | $0.00 | |||||
Grok 3 Reasoning Beta | 128k | $0.00 | |||||
Grok 3 mini Reasoning | 128k | $0.00 | |||||
Grok 3 | 128k | $0.00 | |||||
Nova Lite | ![]() | 300k | $0.10 | ||||
Solar Mini | ![]() | 4k | $0.15 | ||||
Reka Flash | ![]() | 128k | $0.35 | ||||
Reka Core | ![]() | 128k | $2.00 | ||||
Reka Flash (Feb '24) | ![]() | 128k | $0.35 | ||||
Reka Edge | ![]() | 128k | $0.10 | ||||
Aya Expanse 8B | 8k | $0.75 | 147.2 | 0.22 | |||
Jamba 1.5 Mini | 256k | $0.25 | 149.1 | 0.41 | |||
Qwen Chat 72B | 34k | $1.00 | |||||
Qwen1.5 Chat 110B | 32k | $0.00 | |||||
GPT-4 Turbo | 128k | $15.00 | |||||
GPT-4 | 8k | $37.50 | |||||
Gemini 1.0 Pro | 33k | $0.75 | |||||
Claude 2.0 | 100k | $12.00 | |||||
Sonar 3.1 Small | 127k | $0.20 | |||||
Sonar 3.1 Large | 127k | $1.00 |
Key definitions
Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (0314), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Turbo (1106), GPT-4 Vision, GPT-4.5 (Preview), GPT-4o (Aug '24), GPT-4o (ChatGPT), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o Realtime (Dec '24), GPT-4o mini, GPT-4o mini Realtime (Dec '24), o1, o1-mini, o1-preview, o3, o3-mini, and o3-mini (high), Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), and Llama 3.3 70B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash, Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking exp. (Dec '24), Gemini 2.0 Flash Thinking exp. (Jan '25), Gemini 2.0 Flash-Lite (Feb '25), Gemini 2.0 Flash-Lite (Preview), Gemini 2.0 Pro Experimental, Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), Claude 3.7 Sonnet Thinking, Claude 3.7 Sonnet, and Claude Instant, Mistral: Codestral (Jan '25), Codestral (May '24), Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large 2 (Nov '24), Mistral Medium, Mistral NeMo, Mistral Saba, Mistral Small (Feb '24), Mistral Small (Sep '24), Mistral Small 3, Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, DeepSeek: DeepSeek Coder V2 Lite, DeepSeek LLM 67B (V1), DeepSeek R1, DeepSeek R1 Distill Llama 70B, DeepSeek R1 Distill Llama 8B, DeepSeek R1 Distill Qwen 1.5B, DeepSeek R1 Distill Qwen 14B, DeepSeek R1 Distill Qwen 32B, DeepSeek V3, DeepSeek-Coder-V2, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec '24), DeepSeek-VL2, and Janus Pro 7B, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, Sonar Pro, Sonar Reasoning, Sonar Reasoning Pro, and Sonar Small, xAI: Grok 2, Grok 3, Grok 3 Reasoning Beta, Grok 3 mini, Grok 3 mini Reasoning, Grok Beta, and Grok-1, OpenChat: OpenChat 3.5, Amazon: Nova Lite, Nova Micro, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, Phi-4, Phi-4 Mini, and Phi-4 Multimodal, Upstage: Solar Mini, Solar Pro, and Solar Pro (Nov '24), Databricks: DBRX, MiniMax: MiniMax-Text-01, NVIDIA: Cosmos Nemotron 34B and Llama 3.1 Nemotron 70B, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Inceptionlabs: Mercury Coder Mini, Mercury Coder Small, and Mercury Instruct, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), Reka Flash (Feb '25), and Reka Flash, Other: LLaVA-v1.5-7B, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command R7B, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Large (Feb '25), Jamba 1.5 Mini, Jamba 1.5 Mini (Feb 2025), Jamba 1.6 Large, Jamba 1.6 Mini, and Jamba Instruct, Snowflake: Arctic, Alibaba: QwQ 32B-Preview, Qwen Chat 72B, Qwen Plus, Qwen Turbo, Qwen1.5 Chat 110B, Qwen1.5 Chat 14B, Qwen1.5 Chat 32B, Qwen1.5 Chat 72B, Qwen1.5 Chat 7B, Qwen2 72B, Qwen2 Instruct 7B, Qwen2 Instruct A14B 57B, Qwen2-VL 72B, Qwen2.5 Coder 32B, Qwen2.5 Coder 7B , Qwen2.5 Instruct 14B, Qwen2.5 Instruct 32B, Qwen2.5 72B, Qwen2.5 Instruct 7B, Qwen2.5 Max, and Qwen2.5 Max 01-29, and 01.AI: Yi-Large.