Independent analysis of AI language models and API providers

Understand the AI landscape and choose the best model and API provider for your use-case

Highlights

Quality
Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better

Language Models Comparison Highlights

Quality Comparison by Ability

+ Add model from specific provider
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Coding (HumanEval)
Different use-cases warrant considering different evaluation tests. Chatbot Arena is a good evaluation of communication abilities while MMLU tests reasoning and knowledge more comprehensively.
Median across providers: Figures represent median (P50) across all providers which support the model.

Quality vs. Output speed

+ Add model from specific provider
Quality: General reasoning index, Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Size represents Price (USD per M Tokens)
There is a trade-off between model quality and output speed, with higher quality models typically having lower output speed.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.

Quality vs. Price

+ Add model from specific provider
While higher quality models are typically more expensive, they do not all follow the same price-quality curve.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.

Output Speed

+ Add model from specific provider
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median across providers: Figures represent median (P50) across all providers which support the model.

Pricing: Input and Output Prices

+ Add model from specific provider
USD per 1M Tokens
Input price
Output price
Prices vary considerably, including between input and output token price. Prices can vary by orders of magnitude (>10X) between the more expensive and cheapest models.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.

API Provider Highlights: Llama 3 Instruct 70B

Output Speed vs. Price: Llama 3 Instruct 70B

Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Microsoft Azure
Groq
Together.ai
Perplexity
Fireworks
Lepton AI
Deepinfra
Replicate
Databricks
OctoAI
Smaller, emerging providers are offering high output speed and at competitive prices.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median: Figures represent median (P50) measurement over the past 14 days.
Variance data is present on the model and API provider pages amongst the detailed performance metrics. See 'Compare Models' and 'Compare API Providers' in the navigation menu for further analysis.

Pricing (Input and Output Prices): Llama 3 Instruct 70B

Price: USD per 1M Tokens; Lower is better
Input price
Output price
Providers typically charge different prices for input and output tokens. The ratio of input / output token price for a certain use-case may significantly impact overall costs.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.

Output Speed, Over Time: Llama 3 Instruct 70B

Output Tokens per Second; Higher is better
Smaller, emerging providers offer high output speed, though precise speeds delivered vary day-to-day.
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
See more information on any of our supported models
Model NameFurther analysis
OpenAI logo
OpenAI logoGPT-4o
OpenAI logoGPT-4 Turbo
OpenAI logoGPT-4o mini
OpenAI logoGPT-4
OpenAI logoGPT-3.5 Turbo Instruct
OpenAI logoGPT-3.5 Turbo
Google logo
Google logoGemini 1.5 Pro
Google logoGemini 1.5 Flash
Google logoGemma 2 27B
Google logoGemma 2 9B
Google logoGemini 1.0 Pro
Google logoGemma 7B Instruct
Meta logo
Meta logoLlama 3.1 Instruct 405B
Meta logoLlama 3.1 Instruct 70B
Meta logoLlama 3 Instruct 70B
Meta logoLlama 3.1 Instruct 8B
Meta logoLlama 3 Instruct 8B
Meta logoLlama 2 Chat 70B
Meta logoLlama 2 Chat 13B
Meta logoLlama 2 Chat 7B
Mistral logo
Mistral logoMistral Large 2
Mistral logoCodestral
Mistral logoCodestral-Mamba
Mistral logoMistral Large
Mistral logoMixtral 8x22B Instruct
Mistral logoMistral Small
Mistral logoMistral Medium
Mistral logoMistral NeMo
Mistral logoMixtral 8x7B Instruct
Mistral logoMistral 7B Instruct
Anthropic logo
Anthropic logoClaude 3.5 Sonnet
Anthropic logoClaude 3 Opus
Anthropic logoClaude 3 Sonnet
Anthropic logoClaude 3 Haiku
Anthropic logoClaude 2.0
Anthropic logoClaude Instant
Anthropic logoClaude 2.1
Cohere logo
Cohere logoCommand Light
Cohere logoCommand
Cohere logoCommand-R+
Cohere logoCommand-R
Perplexity logo
Perplexity logoSonar Large
Perplexity logoSonar Small
OpenChat logo
OpenChat logoOpenChat 3.5 (1210)
Microsoft Azure logo
Microsoft Azure logoPhi-3 Medium Instruct 14B
Databricks logo
Databricks logoDBRX Instruct
Reka AI logo
Reka AI logoReka Core
Reka AI logoReka Flash
Reka AI logoReka Edge
AI21 Labs logo
AI21 Labs logoJamba Instruct
DeepSeek logo
DeepSeek logoDeepSeek-Coder-V2
DeepSeek logoDeepSeek-V2-Chat
Snowflake logo
Snowflake logoArctic Instruct
Alibaba logo
Alibaba logoQwen2 Instruct 72B
01.AI logo
01.AI logoYi-Large