LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS
Features | Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
o1-preview | 128k | 86 | $27.56 | 143.8 | 20.57 | ||
o1-mini | 128k | 84 | $5.25 | 215.5 | 11.00 | ||
GPT-4o (Aug '24) | 128k | 78 | $4.38 | 67.6 | 0.73 | ||
GPT-4o (May '24) | 128k | 78 | $7.50 | 88.0 | 0.65 | ||
GPT-4o mini | 128k | 73 | $0.26 | 86.7 | 0.66 | ||
GPT-4o (Nov '24) | 128k | 73 | $4.38 | 105.3 | 0.45 | ||
GPT-4o mini Realtime (Dec '24) | 128k | $0.00 | |||||
GPT-4o Realtime (Dec '24) | 128k | $0.00 | |||||
Llama 3.3 70B | 128k | 74 | $0.69 | 71.6 | 0.47 | ||
Llama 3.1 405B | 128k | 74 | $3.50 | 30.0 | 0.72 | ||
Llama 3.1 70B | 128k | 68 | $0.72 | 72.8 | 0.42 | ||
Llama 3.2 90B (Vision) | 128k | 68 | $0.90 | 46.2 | 0.33 | ||
Llama 3.2 11B (Vision) | 128k | 54 | $0.18 | 131.8 | 0.28 | ||
Llama 3.1 8B | 128k | 54 | $0.10 | 184.5 | 0.31 | ||
Llama 3.2 3B | 128k | 49 | $0.06 | 197.6 | 0.38 | ||
Llama 3.2 1B | 128k | 26 | $0.04 | 314.9 | 0.35 | ||
Gemini 2.0 Flash (exp) | 1m | 82 | $0.00 | 169.1 | 0.46 | ||
Gemini 1.5 Pro (Sep) | 2m | 80 | $2.19 | 60.5 | 0.73 | ||
Gemini 1.5 Flash (Sep) | 1m | 74 | $0.13 | 187.2 | 0.26 | ||
Gemma 2 27B | 8k | 61 | $0.26 | 59.8 | 0.42 | ||
Gemma 2 9B | 8k | 55 | $0.12 | 165.1 | 0.33 | ||
Gemini 1.5 Flash-8B | 1m | 47 | $0.07 | 279.4 | 0.37 | ||
Gemini Experimental (Nov) | 2m | $0.00 | 54.5 | 1.24 | |||
Gemini 1.5 Flash (May) | 1m | $0.13 | 307.4 | 0.29 | |||
Gemini 1.5 Pro (May) | 2m | $2.19 | 66.3 | 0.48 | |||
Claude 3.5 Sonnet (Oct) | 200k | 80 | $6.00 | 72.0 | 1.03 | ||
Claude 3.5 Sonnet (June) | 200k | 76 | $6.00 | 61.3 | 0.85 | ||
Claude 3 Opus | 200k | 70 | $30.00 | 26.0 | 2.04 | ||
Claude 3.5 Haiku | 200k | 68 | $1.60 | 64.7 | 0.73 | ||
Claude 3 Haiku | 200k | 55 | $0.50 | 119.8 | 0.72 | ||
Pixtral Large | 128k | 74 | $3.00 | 39.6 | 0.51 | ||
Mistral Large 2 (Jul '24) | 128k | 74 | $3.00 | 32.6 | 0.51 | ||
Mistral Large 2 (Nov '24) | 128k | 74 | $3.00 | 37.1 | 0.52 | ||
Mistral Small (Sep '24) | 33k | 61 | $0.30 | 63.9 | 0.41 | ||
Mixtral 8x22B | 65k | 61 | $1.20 | 81.8 | 0.56 | ||
Pixtral 12B | 128k | 56 | $0.13 | 69.2 | 0.44 | ||
Ministral 8B | 128k | 56 | $0.10 | 137.0 | 0.34 | ||
Mistral NeMo | 128k | 54 | $0.09 | 84.8 | 0.48 | ||
Ministral 3B | 128k | 53 | $0.04 | 170.5 | 0.32 | ||
Mixtral 8x7B | 33k | 41 | $0.50 | 104.2 | 0.33 | ||
Codestral-Mamba | 256k | 33 | $0.25 | 94.4 | 0.51 | ||
Codestral (Jan '25) | 256k | $0.00 | |||||
Command-R+ | 128k | 55 | $5.19 | 50.8 | 0.48 | ||
Command-R+ (Apr '24) | 128k | 45 | $6.00 | 49.1 | 0.51 | ||
Command-R (Mar '24) | 128k | 36 | $0.75 | 108.2 | 0.36 | ||
Aya Expanse 8B | 8k | $0.75 | 165.8 | 0.15 | |||
Command-R | 128k | $0.51 | 111.9 | 0.32 | |||
Aya Expanse 32B | 128k | $0.75 | 120.8 | 0.17 | |||
Sonar 3.1 Small | 127k | $0.20 | 203.4 | 0.31 | |||
Sonar 3.1 Large | 127k | $1.00 | 56.2 | 0.31 | |||
Grok Beta | 128k | 72 | $7.50 | 66.7 | 0.37 | ||
Nova Pro | 300k | 75 | $1.40 | 89.3 | 0.37 | ||
Nova Lite | 300k | 70 | $0.10 | 145.4 | 0.33 | ||
Nova Micro | 130k | 66 | $0.06 | 194.8 | 0.32 | ||
Phi-4 | 16k | 77 | $0.09 | 82.4 | 0.22 | ||
Phi-3 Mini | 4k | $0.00 | |||||
Phi-3 Medium 14B | 128k | $0.30 | 47.6 | 0.43 | |||
Solar Mini | 4k | 47 | $0.15 | ||||
DBRX | 33k | 46 | $1.16 | 70.4 | 0.41 | ||
Llama 3.1 Nemotron 70B | 128k | 72 | $0.27 | 48.0 | 0.57 | ||
Reka Flash | 128k | 59 | $0.35 | ||||
Reka Core | 128k | 58 | $2.00 | ||||
Reka Flash (Feb '24) | 128k | 46 | $0.35 | ||||
Reka Edge | 128k | 31 | $0.10 | ||||
Jamba 1.5 Large | 256k | 64 | $3.50 | 51.0 | 0.69 | ||
Jamba 1.5 Mini | 256k | $0.25 | 84.5 | 0.47 | |||
DeepSeek V3 | 128k | 79 | $0.90 | 18.2 | 0.94 | ||
DeepSeek-V2.5 (Dec '24) | 128k | 72 | $0.17 | 56.5 | 1.10 | ||
DeepSeek-Coder-V2 | 128k | 71 | $0.17 | 55.4 | 1.03 | ||
DeepSeek-V2 | 128k | $0.17 | |||||
DeepSeek-V2.5 | 128k | $1.09 | 7.8 | 0.77 | |||
Arctic | 4k | 51 | $0.00 | ||||
Qwen2.5 72B | 131k | 77 | $0.40 | 68.1 | 0.51 | ||
Qwen2.5 Coder 32B | 131k | 72 | $0.80 | 81.4 | 0.35 | ||
Qwen2 72B | 131k | 69 | $0.90 | 64.1 | 0.34 | ||
QwQ 32B-Preview | 33k | 46 | $0.26 | 63.6 | 0.42 | ||
Yi-Large | 32k | 61 | $3.00 | 68.5 | 0.45 | ||
GPT-4 Turbo | 128k | 75 | $15.00 | 37.9 | 1.25 | ||
GPT-4 | 8k | $37.50 | 27.4 | 0.75 | |||
Llama 3 70B | 8k | 47 | $0.89 | 48.3 | 0.38 | ||
Llama 3 8B | 8k | 45 | $0.15 | 111.4 | 0.34 | ||
Llama 2 Chat 70B | 4k | $0.00 | |||||
Llama 2 Chat 7B | 4k | $0.10 | 123.9 | 0.36 | |||
Llama 2 Chat 13B | 4k | $0.00 | |||||
Gemini 1.0 Pro | 33k | $0.75 | 102.7 | 1.24 | |||
Claude 3 Sonnet | 200k | 57 | $6.00 | 71.7 | 0.79 | ||
Claude 2.1 | 200k | $12.00 | 19.5 | 1.41 | |||
Claude 2.0 | 100k | $12.00 | 29.7 | 0.81 | |||
Mistral Small (Feb '24) | 33k | 59 | $1.50 | 53.0 | 0.39 | ||
Mistral Large (Feb '24) | 33k | 56 | $6.00 | 39.0 | 0.45 | ||
Mistral 7B | 8k | 28 | $0.16 | 112.5 | 0.26 | ||
Mistral Medium | 33k | $4.09 | 43.7 | 0.44 | |||
Codestral (May '24) | 33k | $0.30 | 84.3 | 0.33 | |||
OpenChat 3.5 | 8k | 44 | $0.06 | 71.4 | 0.29 | ||
Jamba Instruct | 256k | $0.55 | 78.0 | 0.50 |
Key definitions
Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o (Aug '24), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o Realtime (Dec '24), GPT-4o mini, GPT-4o mini Realtime (Dec '24), o1, o1-mini, and o1-preview, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), and Llama 3.3 70B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking (exp), Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), and Claude Instant, Mistral: Codestral (Jan '25), Codestral (May '24), Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large 2 (Nov '24), Mistral Medium, Mistral NeMo, Mistral Small (Feb '24), Mistral Small (Sep '24), Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command R7B, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, and Sonar Small, xAI: Grok Beta and Grok-1, OpenChat: OpenChat 3.5, Amazon: Nova Lite, Nova Micro, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, and Phi-4, Upstage: Solar Mini, Solar Pro, and Solar Pro (Nov '24), Databricks: DBRX, MiniMax: MiniMax-Text-01, NVIDIA: Llama 3.1 Nemotron 70B, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), and Reka Flash, Other: LLaVA-v1.5-7B, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek V3, DeepSeek-Coder-V2, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec '24), and DeepSeek-VL2, Snowflake: Arctic, Alibaba: QwQ 32B-Preview, Qwen Max, Qwen Plus, Qwen Turbo, Qwen1.5 Chat 110B, Qwen1.5 Chat 14B, Qwen1.5 Chat 32B, Qwen1.5 Chat 72B, Qwen1.5 Chat 7B, Qwen2 72B, Qwen2 Instruct 7B, Qwen2 Instruct A14B 57B, Qwen2-VL 72B, Qwen2.5 Coder 32B, Qwen2.5 Instruct 14B, Qwen2.5 Instruct 32B, Qwen2.5 72B, and Qwen2.5 Instruct 7B, and 01.AI: Yi-Large.