LLM API Providers Leaderboard
Comparison and ranking of API provider performance for AI LLM Models across key metrics including price, performance / speed (throughput & latency), context window & others. For more details including relating to our methodology, see our FAQs.
API providers compared: OpenAI, Mistral, Microsoft Azure, Amazon Bedrock, Groq, Together.ai, Anthropic, Perplexity, Google, Fireworks, Baseten, Cohere, Lepton AI, Speechmatics, Deepinfra, Replicate, NVIDIA NGC (Demo), Runpod, Rev AI, fal.ai, AssemblyAI, Deepgram, Gladia, Stability.ai, Midjourney, Databricks, and OctoAI.
Context | Model Quality | Price | Throughput | Latency | |||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
GPT-4 | 8k | 90 | $37.50 | 19.6 | 0.76 | ||
GPT-4 | 8k | 90 | $37.50 | 16.3 | 0.43 | ||
GPT-4 Turbo | 128k | 100 | $15.00 | 21.0 | 0.79 | ||
GPT-4 Turbo | 128k | 100 | $15.00 | 12.8 | 0.50 | ||
GPT-4 Vision | 128k | 100 | $15.00 | 32.6 | 0.55 | ||
GPT-3.5 Turbo | 16k | 67 | $0.75 | 56.2 | 0.43 | ||
GPT-3.5 Turbo | 16k | 67 | $0.75 | 53.1 | 0.30 | ||
GPT-3.5 Turbo Instruct | 4k | 60 | $1.63 | 69.3 | 0.33 | ||
GPT-3.5 Turbo Instruct | 4k | 60 | $1.63 | 145.4 | 0.60 | ||
Llama 3 (70B) | 8k | 88 | $1.18 | 43.9 | 2.33 | ||
Llama 3 (70B) | 8k | 88 | $0.93 | 36.0 | 0.28 | ||
Llama 3 (70B) | 8k | 88 | $0.90 | 142.9 | 0.23 | ||
Llama 3 (70B) | 8k | 88 | $0.64 | 32.1 | 0.70 | ||
Llama 3 (70B) | 8k | 88 | $0.64 | 303.3 | 0.28 | ||
Llama 3 (70B) | 8k | 88 | $1.00 | 40.0 | 0.26 | ||
Llama 3 (70B) | 8k | 88 | $0.90 | 98.2 | 0.52 | ||
Llama 2 Chat (70B) | 4k | 56 | $1.18 | 56.4 | 1.95 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.20 | 119.8 | 1.97 | ||
Llama 3 (8B) | 8k | 58 | $0.10 | 245.0 | 1.40 | ||
Llama 2 Chat (70B) | 4k | 56 | $2.10 | 40.5 | 0.42 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.81 | 52.5 | 0.30 | ||
Llama 2 Chat (70B) | 4k | 56 | $0.93 | 24.9 | 0.28 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.28 | 50.1 | 0.26 | ||
Llama 3 (8B) | 8k | 58 | $0.14 | 102.2 | 0.32 | ||
Llama 2 Chat (70B) | 4k | 56 | $1.60 | 16.8 | 2.75 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.84 | 42.6 | 1.21 | ||
Llama 2 Chat (70B) | 4k | 56 | $0.90 | 73.0 | 0.31 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.20 | 140.4 | 0.26 | ||
Llama 3 (8B) | 8k | 58 | $0.20 | 303.7 | 0.25 | ||
Llama 2 Chat (70B) | 4k | 56 | $0.76 | 74.1 | 0.56 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.35 | 41.5 | 0.64 | ||
Llama 3 (8B) | 8k | 58 | $0.10 | 114.8 | 0.68 | ||
Llama 2 Chat (70B) | 4k | 56 | $0.68 | 253.5 | 0.37 | ||
Llama 3 (8B) | 8k | 58 | $0.06 | 893.9 | 0.31 | ||
Llama 2 Chat (70B) | 4k | 56 | $1.00 | ||||
Llama 3 (8B) | 8k | 58 | $0.20 | 121.5 | 0.21 | ||
Llama 2 Chat (70B) | 4k | 56 | $0.90 | 42.7 | 0.51 | ||
Llama 2 Chat (13B) | 4k | 37 | $0.23 | 48.8 | 0.30 | ||
Llama 3 (8B) | 8k | 58 | $0.20 | 259.7 | 0.29 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.10 | 234.5 | 1.62 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.56 | 69.2 | 0.84 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.20 | 189.1 | 0.26 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.20 | 25.2 | 0.68 | ||
Llama 2 Chat (7B) | 4k | 27 | $0.20 | 89.8 | 0.29 | ||
Code Llama (70B) | 4k | 58 | $0.90 | 31.2 | 0.30 | ||
Code Llama (70B) | 4k | 58 | $0.75 | 31.6 | 0.64 | ||
Code Llama (70B) | 16k | 58 | $1.00 | 51.9 | 0.24 | ||
Code Llama (70B) | 4k | 58 | $0.90 | 29.0 | 0.35 | ||
Mistral Large | 33k | 84 | $12.00 | 24.9 | 0.20 | ||
Mistral Large | 33k | 84 | $12.00 | 31.1 | 0.36 | ||
Mistral Medium | 33k | 76 | $4.05 | 21.2 | 0.21 | ||
Mixtral 8x22B | 65k | 83 | $3.00 | 75.9 | 0.21 | ||
Mixtral 8x22B | 65k | 83 | $1.20 | 42.8 | 0.26 | ||
Mixtral 8x22B | 65k | 83 | $1.20 | 78.7 | 0.24 | ||
Mixtral 8x22B | 65k | 83 | $0.65 | 46.6 | 0.68 | ||
Mixtral 8x22B | 16k | 83 | $1.00 | 61.0 | 0.23 | ||
Mixtral 8x22B | 65k | 83 | $1.20 | 45.6 | 0.82 | ||
Mixtral 8x7B | 33k | 68 | $0.70 | 91.4 | 0.21 | ||
Mixtral 8x7B | 33k | 68 | $0.47 | 143.9 | 1.48 | ||
Mixtral 8x7B | 33k | 68 | $0.51 | 68.8 | 0.31 | ||
Mixtral 8x7B | 33k | 68 | $0.35 | 47.0 | 0.38 | ||
Mixtral 8x7B | 33k | 68 | $0.50 | 97.7 | 0.30 | ||
Mixtral 8x7B | 33k | 68 | $0.50 | 249.7 | 0.21 | ||
Mixtral 8x7B | 33k | 68 | $0.27 | 57.9 | 0.66 | ||
Mixtral 8x7B | 33k | 68 | $0.27 | 475.0 | 0.26 | ||
Mixtral 8x7B | 16k | 68 | $0.60 | 117.8 | 0.21 | ||
Mixtral 8x7B | 33k | 68 | $0.60 | 115.9 | 0.40 | ||
Mistral Small | 33k | 73 | $3.00 | 55.7 | 0.21 | ||
Mistral 7B | 33k | 40 | $0.25 | 64.0 | 0.20 | ||
Mistral 7B | 33k | 40 | $0.10 | 101.1 | 1.65 | ||
Mistral 7B | 33k | 40 | $0.16 | 94.7 | 0.28 | ||
Mistral 7B | 33k | 40 | $0.14 | 75.3 | 0.31 | ||
Mistral 7B | 33k | 40 | $0.20 | 242.7 | 0.18 | ||
Mistral 7B | 33k | 40 | $0.13 | 59.2 | 0.58 | ||
Mistral 7B | 16k | 40 | $0.20 | 96.6 | 0.21 | ||
Mistral 7B | 8k | 40 | $0.20 | 77.2 | 0.43 | ||
Mistral 7B | 4k | 40 | $0.20 | 230.9 | 0.11 | ||
Gemini 1.5 Pro | 1000k | 88 | $10.50 | 43.3 | 1.29 | ||
Gemini 1.0 Pro | 33k | 66 | $0.75 | 78.8 | 1.45 | ||
Gemma 7B | 8k | 59 | $0.20 | 207.9 | 0.25 | ||
Gemma 7B | 8k | 59 | $0.13 | 61.3 | 0.62 | ||
Gemma 7B | 8k | 59 | $0.10 | 918.8 | 0.29 | ||
Gemma 7B | 8k | 59 | $0.20 | 116.4 | 0.31 | ||
Claude 3 Opus | 200k | 100 | $30.00 | 26.1 | 0.93 | ||
Claude 3 Opus | 200k | 100 | $30.00 | 25.9 | 1.31 | ||
Claude 3 Sonnet | 200k | 85 | $6.00 | 65.4 | 0.58 | ||
Claude 3 Sonnet | 200k | 85 | $6.00 | ||||
Claude 3 Sonnet | 200k | 85 | $6.00 | 62.5 | 0.67 | ||
Claude 3 Haiku | 200k | 78 | $0.50 | 85.3 | 0.51 | ||
Claude 3 Haiku | 200k | 78 | $0.50 | ||||
Claude 3 Haiku | 200k | 78 | $0.50 | 98.1 | 0.34 | ||
Claude 2.1 | 200k | 66 | $12.00 | 41.1 | 0.52 | ||
Claude 2.1 | 200k | 66 | $12.00 | 42.2 | 0.43 | ||
Claude 2.0 | 100k | 72 | $12.00 | 39.4 | 0.46 | ||
Claude Instant | 100k | 65 | $1.20 | 84.2 | 0.34 | ||
Claude Instant | 100k | 65 | $1.20 | 90.5 | 0.48 | ||
Command-R+ | 128k | 80 | $6.00 | 40.2 | 0.16 | ||
Command-R | 128k | 67 | $0.75 | 110.9 | 0.16 | ||
Command | 4k | $1.63 | 28.4 | 0.32 | |||
Command | 4k | $1.25 | 28.7 | 0.36 | |||
Command Light | 4k | $0.38 | 47.7 | 0.31 | |||
Command Light | 4k | $0.38 | 81.4 | 0.16 | |||
DBRX | 33k | 76 | $0.90 | 148.5 | 0.44 | ||
DBRX | 33k | 76 | $1.60 | 25.3 | 0.44 | ||
DBRX | 33k | 76 | $3.38 | 118.4 | 0.57 | ||
DBRX | 33k | 76 | $1.20 | 79.7 | 0.43 | ||
OpenChat 3.5 | 8k | 56 | $0.13 | 51.3 | 0.61 | ||
OpenChat 3.5 | 8k | 56 | $0.20 | 90.8 | 0.70 | ||
PPLX-70B Online | 4k | 45 | $1.00 | 38.4 | 1.19 | ||
PPLX-7B-Online | 4k | 35 | $0.20 | 95.3 | 0.94 |
Key definitions
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.