Models Leaderboard
Comparison and ranking the performance of AI LLM models across key metrics including quality, price, performance and speed (throughput & latency), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS
Highest Quality:
Highest Throughput (fastest):
Lowest Latency:
Largest Context Window:
Context | Quality | Price | Throughput | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
GPT-4 | 8k | 90 | $37.50 | 19.2 | 0.56 | ||
GPT-4 Turbo | 128k | 100 | $15.00 | 17.4 | 0.64 | ||
GPT-4 Vision | 128k | 100 | $15.00 | 33.0 | 0.57 | ||
GPT-3.5 Turbo | 16k | 67 | $0.75 | 55.5 | 0.33 | ||
GPT-3.5 Turbo Instruct | 4k | 60 | $1.63 | 111.6 | 0.51 | ||
Llama 3 (70B) | 8k | 88 | $0.90 | 45.8 | 0.32 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.25 | 50.6 | 0.34 | ||
Llama 2 Chat (70B) | 4k | 56 | $1.00 | 44.6 | 0.44 | ||
Llama 3 (8B) | 8k | 58 | $0.14 | 224.5 | 0.28 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.20 | 89.9 | 0.61 | ||
Code Llama (70B) | 16k | 58 | $0.90 | 31.6 | 0.30 | ||
Mistral Large | 33k | 84 | $12.00 | 26.9 | 0.31 | ||
Mistral Medium | 33k | 76 | $4.05 | 21.6 | 0.20 | ||
Mixtral 8x22B | 65k | 83 | $1.20 | 59.8 | 0.26 | ||
Mixtral 8x7B | 33k | 68 | $0.50 | 102.6 | 0.28 | ||
Mistral Small | 33k | 73 | $3.00 | 55.6 | 0.21 | ||
Mistral 7B | 33k | 40 | $0.20 | 81.9 | 0.25 | ||
Gemini 1.5 Pro | 1000k | 88 | $10.50 | 43.2 | 1.27 | ||
Gemini 1.0 Pro | 33k | 66 | $0.75 | 77.8 | 1.45 | ||
Gemma 7B | 8k | 59 | $0.15 | 164.9 | 0.28 | ||
Claude 3 Opus | 200k | 100 | $30.00 | 26.4 | 1.09 | ||
Claude 3 Sonnet | 200k | 85 | $6.00 | 62.7 | 0.60 | ||
Claude 3 Haiku | 200k | 78 | $0.50 | 94.0 | 0.39 | ||
Claude 2.1 | 200k | 66 | $12.00 | 42.5 | 0.48 | ||
Claude 2.0 | 100k | 72 | $12.00 | 39.6 | 0.46 | ||
Claude Instant | 100k | 65 | $1.20 | 86.7 | 0.41 | ||
Command-R+ | 128k | 80 | $6.00 | 40.3 | 0.16 | ||
Command-R | 128k | 67 | $0.75 | 111.4 | 0.16 | ||
Command | 4k | $1.44 | 28.4 | 0.34 | |||
Command Light | 4k | $0.38 | 52.9 | 0.25 | |||
DBRX | 33k | 76 | $1.40 | 79.7 | 0.48 | ||
OpenChat 3.5 | 8k | 56 | $0.17 | 70.7 | 0.67 | ||
PPLX-70B Online | 4k | 45 | $1.00 | 38.3 | 1.17 | ||
PPLX-7B-Online | 4k | 35 | $0.20 | 95.1 | 0.96 |
Key definitions
Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), and GPT-4 Vision, Google: Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemma 7B, Meta: Code Llama (70B), Llama 2 Chat (13B), Llama 2 Chat (70B), Llama 2 Chat (7B), Llama 3 (70B), and Llama 3 (8B), Mistral: Mistral 7B, Mistral Large, Mistral Medium, Mistral Small, Mixtral 8x22B, and Mixtral 8x7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, and Claude Instant, Cohere: Command, Command Light, Command-R, and Command-R+, Perplexity: PPLX-70B Online and PPLX-7B-Online, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3-mini, and Databricks: DBRX.