LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS
![Mistral NeMo logo](/img/logos/mistral_small.png)
![Sonar Small logo](/img/logos/perplexity_small.png)
![Sonar Small logo](/img/logos/perplexity_small.png)
![Mistral 7B logo](/img/logos/mistral_small.png)
![Sonar Large logo](/img/logos/perplexity_small.png)
![OpenChat 3.5 logo](/img/logos/openchat_small.png)
![Codestral-Mamba logo](/img/logos/mistral_small.png)
![Jamba Instruct logo](/img/logos/ai21_small.png)
Context | Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
GPT-4o | 128k | 100 | $7.50 | 82.3 | 0.45 | ||
GPT-4 Turbo | 128k | 94 | $15.00 | 29.2 | 0.60 | ||
GPT-4o mini | 128k | 88 | $0.26 | 98.9 | 0.55 | ||
GPT-4 | 8k | 84 | $37.50 | 25.5 | 0.67 | ||
GPT-3.5 Turbo Instruct | 4k | 60 | $1.63 | 116.3 | 0.53 | ||
GPT-3.5 Turbo | 16k | 59 | $0.75 | 83.2 | 0.37 | ||
Gemini 1.5 Pro | 2m | 95 | $5.25 | 57.7 | 1.07 | ||
Gemini 1.5 Flash | 1m | 84 | $0.53 | 166.2 | 1.02 | ||
Gemma 2 27B | 8k | 78 | $0.80 | 76.9 | 0.49 | ||
Gemma 2 9B | 8k | 71 | $0.20 | 119.9 | 0.29 | ||
Gemini 1.0 Pro | 33k | 62 | $0.75 | 87.0 | 2.10 | ||
Gemma 7B | 8k | 45 | $0.15 | 147.5 | 0.33 | ||
Llama 3.1 405B | 128k | 100 | $6.50 | 26.9 | 0.61 | ||
Llama 3.1 70B | 128k | 95 | $0.88 | 57.6 | 0.43 | ||
Llama 3 70B | 8k | 83 | $0.90 | 63.5 | 0.45 | ||
Llama 3.1 8B | 128k | 66 | $0.18 | 147.4 | 0.30 | ||
Llama 3 8B | 8k | 64 | $0.17 | 148.1 | 0.32 | ||
Llama 2 Chat 70B | 4k | 57 | $1.00 | 49.7 | 0.80 | ||
Mistral Large 2 | ![]() | 128k | 91 | $4.50 | 30.4 | 0.44 | |
Llama 2 Chat 13B | 4k | 39 | $0.25 | 83.7 | 0.47 | ||
Llama 2 Chat 7B | 4k | 29 | $0.20 | 91.7 | 1.04 | ||
Codestral | ![]() | 33k | $1.50 | 53.2 | 0.31 | ||
Codestral-Mamba | ![]() | 256k | $0.25 | 95.6 | 0.43 | ||
Mistral Large | ![]() | 33k | 76 | $6.00 | 34.6 | 0.56 | |
Mixtral 8x22B | ![]() | 65k | 71 | $1.20 | 66.1 | 0.32 | |
Mistral Small | ![]() | 33k | 71 | $1.50 | 56.4 | 0.95 | |
Mistral Medium | ![]() | 33k | 70 | $4.05 | 37.9 | 0.66 | |
Mistral NeMo | ![]() | 128k | 64 | $0.30 | 188.0 | 0.32 | |
Mixtral 8x7B | ![]() | 33k | 61 | $0.50 | 89.0 | 0.34 | |
Mistral 7B | ![]() | 33k | 40 | $0.18 | 104.7 | 0.29 | |
Claude 3.5 Sonnet | 200k | 98 | $6.00 | 78.7 | 1.14 | ||
Claude 3 Opus | 200k | 93 | $30.00 | 25.6 | 1.95 | ||
Claude 3 Sonnet | 200k | 80 | $6.00 | 63.3 | 0.92 | ||
Claude 3 Haiku | 200k | 74 | $0.50 | 129.7 | 0.53 | ||
Claude 2.0 | 100k | 70 | $12.00 | 39.9 | 1.13 | ||
Claude Instant | 100k | 63 | $1.20 | 84.9 | 0.58 | ||
Claude 2.1 | 200k | 55 | $12.00 | 38.7 | 1.51 | ||
Command Light | 4k | $0.38 | 37.1 | 0.48 | |||
Command | 4k | $1.44 | 23.9 | 0.45 | |||
Command-R+ | 128k | 75 | $6.00 | 60.9 | 0.47 | ||
Command-R | 128k | 63 | $0.75 | 124.0 | 0.40 | ||
Sonar Large | 33k | $1.00 | 54.8 | 0.29 | |||
Sonar Small | 33k | $0.20 | 157.2 | 0.23 | |||
OpenChat 3.5 | ![]() | 8k | 50 | $0.14 | 68.8 | 0.35 | |
Phi-3 Medium 14B | ![]() | 128k | $0.14 | 74.0 | 0.21 | ||
DBRX | 33k | 62 | $1.20 | 82.3 | 0.40 | ||
Reka Core | ![]() | 128k | 90 | $6.00 | 15.8 | 1.34 | |
Reka Flash | ![]() | 128k | 78 | $1.10 | 31.2 | 0.84 | |
Reka Edge | ![]() | 64k | 60 | $0.55 | 48.9 | 0.84 | |
Jamba Instruct | 256k | 63 | $0.55 | 66.8 | 0.48 | ||
DeepSeek-Coder-V2 | ![]() | 128k | $0.17 | 16.5 | 1.24 | ||
DeepSeek-V2 | ![]() | 128k | 82 | $0.17 | 16.8 | 1.15 | |
Arctic | 4k | 55 | $2.40 | 72.0 | 0.62 | ||
Qwen2 72B | 128k | 83 | $0.90 | 49.7 | 0.35 | ||
Yi-Large | ![]() | 32k | 81 | $3.00 | 73.9 | 0.36 |
Key definitions
Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o, and GPT-4o mini, Google: Gemini 1.0 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro, Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, and Llama 3.1 8B, Mistral: Codestral, Codestral-Mamba, Mistral 7B, Mistral Large, Mistral Large 2, Mistral Medium, Mistral NeMo, Mistral Small, Mixtral 8x22B, and Mixtral 8x7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Sonnet, and Claude Instant, Cohere: Command, Command Light, Command-R, and Command-R+, Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar Large, and Sonar Small, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3 Medium 14B and Phi-3 Mini, Databricks: DBRX, Reka AI: Reka Core, Reka Edge, and Reka Flash, AI21 Labs: Jamba Instruct, DeepSeek: DeepSeek-Coder-V2 and DeepSeek-V2, Snowflake: Arctic, Alibaba: Qwen2 72B, and 01.AI: Yi-Large.