LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others. For more details including relating to our methodology, see our FAQs.
HIGHLIGHTS
Features | Quality | Price | Output tokens/s | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
o1-preview | 128k | 85 | $26.25 | 19.0 | 52.67 | ||
o1-mini | 128k | 82 | $5.25 | 82.3 | 12.44 | ||
GPT-4o (Aug 6) | 128k | 77 | $4.38 | 104.4 | 0.38 | ||
GPT-4o | 128k | 77 | $7.50 | 118.2 | 0.35 | ||
GPT-4o mini | 128k | 71 | $0.26 | 152.3 | 0.38 | ||
Llama 3.1 405B | 128k | 72 | $5.00 | 21.3 | 1.00 | ||
Llama 3.1 70B | 128k | 65 | $0.88 | 49.3 | 0.62 | ||
Llama 3.1 8B | 128k | 53 | $0.14 | 156.5 | 0.38 | ||
Gemini 1.5 Pro | 2m | 72 | $5.25 | 64.4 | 0.85 | ||
Gemini 1.5 Flash | 1m | 60 | $0.13 | 220.2 | 0.40 | ||
Gemma 2 27B | 8k | 49 | $0.80 | 48.1 | 0.46 | ||
Gemma 2 9B | 8k | 47 | $0.20 | 108.0 | 0.52 | ||
Claude 3.5 Sonnet | 200k | 77 | $6.00 | 85.1 | 1.01 | ||
Claude 3 Opus | 200k | 70 | $30.00 | 25.6 | 1.91 | ||
Claude 3 Haiku | 200k | 54 | $0.50 | 137.8 | 0.57 | ||
Mistral Large 2 | 128k | 73 | $4.50 | 39.2 | 0.51 | ||
Mixtral 8x22B | 65k | 61 | $1.20 | 53.2 | 0.47 | ||
Mistral NeMo | 128k | 52 | $0.20 | 133.0 | 0.33 | ||
Mistral Small | 33k | 50 | $1.50 | 49.1 | 0.35 | ||
Mixtral 8x7B | 33k | 42 | $0.50 | 68.9 | 0.42 | ||
Codestral-Mamba | 256k | 36 | $0.25 | 92.3 | 0.60 | ||
Command-R+ (08-2024) | 128k | 56 | $5.19 | 46.9 | 0.54 | ||
Command-R (08-2024) | 128k | 51 | $0.51 | 106.9 | 0.35 | ||
Command-R+ (04-2024) | 128k | 46 | $6.00 | 47.4 | 0.58 | ||
Command-R (03-2024) | 128k | 36 | $0.75 | 102.9 | 0.39 | ||
Sonar Large | 33k | 62 | $1.00 | 35.6 | 0.55 | ||
Sonar Small | 33k | 41 | $0.20 | 141.8 | 0.21 | ||
Sonar 3.1 Small | 131k | $0.20 | 149.7 | 0.18 | |||
Sonar 3.1 Large | 131k | $1.00 | 63.9 | 0.22 | |||
Phi-3 Medium 14B | 128k | $0.45 | 52.3 | 0.46 | |||
DBRX | 33k | 50 | $1.16 | 46.2 | 0.74 | ||
Reka Core | 128k | 57 | $6.00 | 13.2 | 1.32 | ||
Reka Flash | 128k | 46 | $1.10 | 25.3 | 1.11 | ||
Reka Edge | 64k | 30 | $0.55 | ||||
Jamba 1.5 Large | 256k | 64 | $3.50 | 41.4 | 1.47 | ||
Jamba 1.5 Mini | 256k | 46 | $0.25 | 100.3 | 0.91 | ||
DeepSeek-Coder-V2 | 128k | 67 | $0.17 | 17.5 | 1.21 | ||
DeepSeek-V2.5 | 128k | 66 | $0.17 | 17.2 | 1.08 | ||
DeepSeek-V2 | 128k | 66 | $0.17 | 17.2 | 1.20 | ||
Qwen2 72B | 128k | 69 | $0.63 | 44.1 | 0.54 | ||
Yi-Large | 32k | 58 | $3.00 | 58.0 | 0.95 | ||
GPT-4 Turbo | 128k | 74 | $15.00 | 36.5 | 0.65 | ||
GPT-3.5 Turbo | 16k | 53 | $0.75 | 86.5 | 0.37 | ||
GPT-3.5 Turbo Instruct | 4k | $1.63 | 104.1 | 0.48 | |||
GPT-4 | 8k | $37.50 | 27.7 | 0.67 | |||
Llama 3 70B | 8k | 62 | $0.90 | 53.4 | 0.53 | ||
Llama 3 8B | 8k | 46 | $0.15 | 97.6 | 0.38 | ||
Llama 2 Chat 70B | 4k | 34 | $1.39 | 35.1 | 0.41 | ||
Llama 2 Chat 13B | 4k | 25 | $0.30 | 50.6 | 0.57 | ||
Llama 2 Chat 7B | 4k | 10 | $0.33 | 69.0 | 0.23 | ||
Gemma 7B | 8k | 28 | $0.07 | 1,021.0 | 3.78 | ||
Gemini 1.0 Pro | 33k | $0.75 | 98.1 | 1.13 | |||
Claude 3 Sonnet | 200k | 57 | $6.00 | 57.6 | 1.02 | ||
Claude 2.1 | 200k | $12.00 | 29.4 | 1.67 | |||
Claude 2.0 | 100k | $12.00 | 30.4 | 1.24 | |||
Claude Instant | 100k | $1.20 | 66.0 | 0.71 | |||
Mistral Large | 33k | 56 | $6.00 | 26.3 | 0.41 | ||
Mistral 7B | 33k | 24 | $0.16 | 99.7 | 0.33 | ||
Codestral | 33k | $1.50 | 50.4 | 0.45 | |||
Mistral Medium | 33k | $4.09 | 38.1 | 0.78 | |||
Command | 4k | 26 | $1.44 | 20.1 | 1.31 | ||
Command Light | 4k | 14 | $0.38 | 28.1 | 1.80 | ||
OpenChat 3.5 | 8k | 43 | $0.06 | 59.2 | 0.60 | ||
Jamba Instruct | 256k | 28 | $0.55 | 51.4 | 0.69 |
Key definitions
Models compared: OpenAI: GPT-3.5 Turbo, GPT-3.5 Turbo (0125), GPT-3.5 Turbo (1106), GPT-3.5 Turbo Instruct, GPT-4, GPT-4 Turbo, GPT-4 Turbo (0125), GPT-4 Vision, GPT-4o, GPT-4o (Aug 6), GPT-4o mini, o1-mini, and o1-preview, Meta: Code Llama 70B, Llama 2 Chat 13B, Llama 2 Chat 70B, Llama 2 Chat 7B, Llama 3 70B, Llama 3 8B, Llama 3.1 405B, Llama 3.1 70B, and Llama 3.1 8B, Google: Gemini 1.0 Pro, Gemini 1.5 Flash, Gemini 1.5 Pro, Gemma 2 27B, Gemma 2 9B, and Gemma 7B, Anthropic: Claude 2.0, Claude 2.1, Claude 3 Haiku, Claude 3 Opus, Claude 3 Sonnet, Claude 3.5 Sonnet, and Claude Instant, Mistral: Codestral, Codestral-Mamba, Mistral 7B, Mistral Large, Mistral Large 2, Mistral Medium, Mistral NeMo, Mistral Small, Mixtral 8x22B, Mixtral 8x7B, and Pixtral 12B, Cohere: Command, Command Light, Command-R (03-2024), Command-R (08-2024), Command-R+ (04-2024), and Command-R+ (08-2024), Perplexity: PPLX-70B Online, PPLX-7B-Online, Sonar 3.1 Large, Sonar 3.1 Small , Sonar Large, and Sonar Small, xAI: Grok-1, OpenChat: OpenChat 3.5, Microsoft Azure: Phi-3 Medium 14B and Phi-3 Mini, Databricks: DBRX, Reka AI: Reka Core, Reka Edge, and Reka Flash, Other: LLaVA-v1.5-7B, Glaive: Reflection Llama 3.1 - 70B and Reflection Llama 3.1 70B v2, AI21 Labs: Jamba 1.5 Large, Jamba 1.5 Mini, and Jamba Instruct, DeepSeek: DeepSeek-Coder-V2, DeepSeek-V2, and DeepSeek-V2.5, Snowflake: Arctic, Alibaba: Qwen2 72B, and 01.AI: Yi-Large.