Independent analysis of AI models and hosting providers
Understand the AI landscape and choose the best model and API provider for your use-case
AI Builders Survey
Participate & receive our report of results
Highlights
Quality
Quality Index; Higher is better
Speed
Throughput in Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Navigation
Models comparison highlights
Quality comparison by ability
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Reasoning & Knowledge (MT Bench)
Coding (HumanEval)
OpenAI's GPT-4 is the clear quality leader across quality metrics. However, models including Gemini Pro and Mixtral 8x7B have reached GPT-3.5 performance in some measures.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Throughput (tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Quality vs. Throughput
Quality: General reasoning index, Throughput: Tokens per Second, Price: USD per 1M Tokens
Higher quality models are typically more expensive. However, model quality varies significantly and some open source models now achieve very high quality.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Throughput
Output Tokens per Second; Higher is better
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing: Input and Output prices
USD per 1M Tokens
Input price
Output price
Prices vary considerably, including between input and output token price. GPT-4 stands out as orders of magnitude higher priced than most other models.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.
API Providers highlights
Reference model for comparison: Mixtral 8x7B
Throughput vs. Price
Throughput: Tokens per Second, Price: USD per 1M Tokens; Reference model: Mixtral 8x7B
Most attractive quadrant
Mistral
Amazon Bedrock
Groq
Together.ai
Perplexity
Fireworks
Lepton AI
Deepinfra
OctoAI
Smaller, emerging providers are offering high throughput and at competitive prices.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median: Figures represent median (P50) measurement over the past 14 days.
Variance data is present on the model and API provider pages amongst the detailed performance metrics. See 'Compare Models' and 'Compare API Providers' in the navigation menu for further analysis.
Pricing: Input and Output prices
Price: USD per 1M Tokens; Lower is better; Reference model: Mixtral 8x7B
Input price
Output price
Providers typically charge different prices for input and output tokens. The ratio of input / output token price for a certain use-case may significantly impact overall costs.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Throughput, Over Time
Output Tokens per Second; Higher is better
Mistral
Amazon Bedrock
Groq
Together.ai
Perplexity
Fireworks
Lepton AI
Deepinfra
OctoAI
Smaller, emerging providers offer high throughput, though precise speeds delivered vary day-to-day.
Throughput: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
See more information on any of our supported models