LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS
Features | Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
o1-preview | 128k | 86 | $27.56 | 146.2 | 22.42 | ||
o1-mini | 128k | 84 | $5.25 | 226.7 | 9.84 | ||
GPT-4o (Aug '24) | 128k | 78 | $4.38 | 87.4 | 0.64 | ||
GPT-4o (May '24) | 128k | 78 | $7.50 | 95.6 | 0.63 | ||
GPT-4o (Nov '24) | 128k | 73 | $4.38 | 122.8 | 0.40 | ||
GPT-4o mini | 128k | 73 | $0.26 | 113.1 | 0.59 | ||
Llama 3.3 70B | 128k | 74 | $0.67 | 106.9 | 0.76 | ||
Llama 3.1 405B | 128k | 74 | $3.50 | 29.8 | 1.05 | ||
Llama 3.1 70B | 128k | 68 | $0.72 | 80.0 | 0.70 | ||
Llama 3.2 90B (Vision) | 128k | 68 | $0.81 | 39.2 | 0.61 | ||
Llama 3.2 11B (Vision) | 128k | 54 | $0.18 | 126.4 | 0.47 | ||
Llama 3.1 8B | 128k | 54 | $0.10 | 157.5 | 0.44 | ||
Llama 3.2 3B | 128k | 49 | $0.06 | 217.4 | 0.57 | ||
Llama 3.2 1B | 128k | 26 | $0.04 | 332.6 | 0.48 | ||
Gemini 2.0 Flash (exp) | 2m | 82 | $0.00 | 168.7 | 0.55 | ||
Gemini 1.5 Pro (Sep) | 2m | 81 | $2.19 | 59.2 | 0.85 | ||
Gemini 1.5 Flash (Sep) | 1m | 72 | $0.13 | 179.0 | 0.43 | ||
Gemma 2 27B | 8k | 61 | $0.26 | 40.2 | 0.79 | ||
Gemma 2 9B | 8k | 55 | $0.12 | 129.9 | 0.59 | ||
Gemini 1.5 Flash-8B | 1m | $0.07 | 281.3 | 0.40 | |||
Gemini 1.5 Pro (May) | 2m | $2.19 | 66.4 | 0.76 | |||
Gemini Experimental (Nov) | 33k | $0.00 | 40.3 | 1.98 | |||
Gemini 1.5 Flash (May) | 1m | $0.13 | 315.1 | 0.31 | |||
Claude 3.5 Sonnet (Oct) | 200k | 80 | $6.00 | 62.6 | 1.08 | ||
Claude 3.5 Sonnet (June) | 200k | 76 | $6.00 | 55.9 | 1.11 | ||
Claude 3 Opus | 200k | 70 | $30.00 | 26.1 | 2.08 | ||
Claude 3.5 Haiku | 200k | 68 | $1.60 | 65.6 | 0.90 | ||
Claude 3 Haiku | 200k | 55 | $0.50 | 127.5 | 0.61 | ||
Pixtral Large | 128k | 74 | $3.00 | 36.3 | 0.55 | ||
Mistral Large 2 (Jul '24) | 128k | 74 | $3.00 | 32.8 | 0.74 | ||
Mistral Large 2 (Nov '24) | 128k | 74 | $3.00 | 35.3 | 0.75 | ||
Mistral Small (Sep '24) | 128k | 61 | $0.30 | 62.8 | 0.52 | ||
Mixtral 8x22B | 65k | 61 | $1.20 | 53.2 | 0.86 | ||
Pixtral 12B | 128k | 56 | $0.13 | 66.1 | 0.46 | ||
Ministral 8B | 128k | 56 | $0.10 | 124.9 | 0.36 | ||
Mistral NeMo | 128k | 54 | $0.09 | 128.9 | 0.53 | ||
Ministral 3B | 128k | 53 | $0.04 | 162.0 | 0.35 | ||
Mixtral 8x7B | 33k | 41 | $0.50 | 74.1 | 0.55 | ||
Codestral-Mamba | 256k | 33 | $0.25 | 90.6 | 0.69 | ||
Command-R+ | 128k | 55 | $5.19 | 48.9 | 0.64 | ||
Command-R+ (Apr '24) | 128k | 45 | $6.00 | 45.7 | 0.98 | ||
Command-R (Mar '24) | 128k | 36 | $0.75 | 96.5 | 0.60 | ||
Aya Expanse 8B | 8k | $0.75 | 150.9 | 0.24 | |||
Command-R | 128k | $0.51 | 105.4 | 0.39 | |||
Aya Expanse 32B | 128k | $0.75 | 117.9 | 0.24 | |||
Sonar 3.1 Small | 131k | $0.20 | 168.4 | 0.33 | |||
Sonar 3.1 Large | 131k | $1.00 | 52.5 | 0.34 | |||
Grok Beta | 8k | 72 | $7.50 | 61.2 | 0.44 | ||
Nova Pro | 300k | 75 | $1.40 | 92.3 | 0.60 | ||
Nova Lite | 300k | 70 | $0.10 | 147.5 | 0.56 | ||
Nova Micro | 130k | 66 | $0.06 | 197.4 | 0.55 | ||
Phi-3 Medium 14B | 128k | $0.30 | 43.6 | 0.77 | |||
Solar Mini | 4k | 47 | $0.15 | 78.9 | 1.35 | ||
DBRX | 33k | 46 | $1.16 | 48.6 | 0.62 | ||
Llama 3.1 Nemotron 70B | 128k | 72 | $0.27 | 46.1 | 0.81 | ||
Reka Flash | 128k | 59 | $0.35 | 28.0 | 1.64 | ||
Reka Core | 128k | 58 | $2.00 | 10.7 | 1.66 | ||
Reka Flash (Feb '24) | 128k | 46 | $0.35 | 28.2 | 1.20 | ||
Reka Edge | 64k | 31 | $0.10 | 26.3 | 1.48 | ||
Jamba 1.5 Large | 256k | 64 | $3.50 | 25.6 | 1.80 | ||
Jamba 1.5 Mini | 256k | $0.25 | 37.0 | 1.25 | |||
DeepSeek-Coder-V2 | 128k | 71 | $0.17 | 18.2 | 0.96 | ||
DeepSeek-V2 | 128k | $0.17 | 17.8 | 0.91 | |||
DeepSeek-V2.5 | 128k | $1.09 | 16.9 | 1.08 | |||
Qwen2.5 72B | 131k | 77 | $0.40 | 59.3 | 0.93 | ||
Qwen2.5 Coder 32B | 131k | 72 | $0.80 | 78.8 | 0.62 | ||
Qwen2 72B | 128k | 72 | $0.63 | 47.6 | 0.51 | ||
QwQ 32B-Preview | 33k | 46 | $0.26 | 61.4 | 0.81 | ||
Yi-Large | 32k | 61 | $3.00 | 57.8 | 1.48 | ||
GPT-4 Turbo | 128k | 75 | $15.00 | 38.5 | 1.21 | ||
GPT-4 | 8k | $37.50 | 28.1 | 0.70 | |||
Llama 3 70B | 8k | 47 | $0.89 | 51.4 | 0.64 | ||
Llama 3 8B | 8k | 45 | $0.15 | 110.0 | 0.43 | ||
Llama 2 Chat 7B | 4k | $0.33 | 65.9 | 1.19 | |||
Gemini 1.0 Pro | 33k | $0.75 | 102.7 | 1.29 | |||
Claude 3 Sonnet | 200k | 57 | $6.00 | 69.9 | 0.88 | ||
Mistral Small (Feb '24) | 33k | 59 | $1.50 | 50.9 | 0.46 | ||
Mistral Large (Feb '24) | 33k | 56 | $6.00 | 36.5 | 0.77 | ||
Mistral 7B | 33k | 28 | $0.18 | 92.6 | 0.42 | ||
Codestral | 33k | $0.30 | 81.3 | 0.36 | |||
Mistral Medium | 33k | $4.09 | 42.8 | 0.63 | |||
OpenChat 3.5 | 8k | 44 | $0.06 | 59.4 | 0.68 | ||
Jamba Instruct | 256k | $0.55 | 32.9 | 1.19 |
Key definitions
Models compared: OpenAI: GPT 4o Audio, GPT 4o Realtime, GPT 4o Speech Pipeline, GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o (Aug '24), GPT-4o (May '24), GPT-4o (Nov '24), GPT-4o Realtime (Dec '24), GPT-4o mini, GPT-4o mini Realtime (Dec '24), o1-mini, and o1-preview, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, Llama 3.2 11B (Vision), Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 90B (Vision), and Llama 3.3 70B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash (May), Gemini 1.5 Flash (Sep), Gemini 1.5 Flash-8B, Gemini 1.5 Pro (May), Gemini 1.5 Pro (Sep), Gemini 2.0 Flash (exp), Gemini 2.0 Flash Thinking (exp), Gemini Experimental (Nov), Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Haiku, Claude 3.5 Sonnet (June), Claude 3.5 Sonnet (Oct), and Claude Instant, Mistral: Codestral, Codestral-Mamba, Ministral 3B, Ministral 8B, Mistral 7B, Mistral Large (Feb '24), Mistral Large 2 (Jul '24), Mistral Large 2 (Nov '24), Mistral Medium, Mistral NeMo, Mistral Small (Feb '24), Mistral Small (Sep '24), Mixtral 8x22B, Mixtral 8x7B, Pixtral 12B, and Pixtral Large, Cohere: Aya Expanse 32B, Aya Expanse 8B, Command, Command Light, Command R7B, Command-R, Command-R (Mar '24), Command-R+ (Apr '24), and Command-R+, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar 3.1 Huge, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, and Sonar Small, xAI: Grok Beta and Grok-1, OpenChat: OpenChat 3.5, Amazon: Nova Lite, Nova Micro, and Nova Pro, Microsoft Azure: Phi-3 Medium 14B, Phi-3 Mini, and Phi-4, Upstage: Solar Mini, Solar Pro, and Solar Pro (Nov '24), Databricks: DBRX, NVIDIA: Llama 3.1 Nemotron 70B, IBM: Granite 3.0 2B, OpenVoice: Granite 3.0 8B, Reka AI: Reka Core, Reka Edge, Reka Flash (Feb '24), and Reka Flash, Other: LLaVA-v1.5-7B, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek-Coder-V2, DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V2.5 (Dec '24), and DeepSeek-VL2, Snowflake: Arctic, Alibaba: QwQ 32B-Preview, Qwen2 72B, Qwen2-VL 72B, Qwen2.5 Coder 32B, and Qwen2.5 72B, and 01.AI: Yi-Large.