Independent analysis of AI models and API providers
Understand the AI landscape to choose the best model and provider for your use-case
Highlights
Quality
Artificial Analysis Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Who has the fastest API for Llama 3.1 70B?
Llama 3.1 70B providers
Which model is fastest with 100k token prompts?
Long Context Latency
Which Text to Image model should you be using?
Image Arena
Who has the best Video Generation model?
Video Arena
What's the most accurate transcription model?
Speech to Text
API Provider Highlights: Llama 3.1 Instruct 70B
Output Speed vs. Price: Llama 3.1 Instruct 70B
Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
Most attractive quadrant
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Median: Figures represent median (P50) measurement over the past 14 days or otherwise to reflect sustained changes in performance.
Notes: Llama 3.1 70B, Cerebras: 8k context, Llama 3.1 70B (Spec decoding), Groq: 8k context, Llama 3.1 70B, SambaNova: 64k context
Pricing (Input and Output Prices): Llama 3.1 Instruct 70B
Price: USD per 1M Tokens; Lower is better
Input price
Output price
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Notes: Llama 3.1 70B, Cerebras: 8k context, Llama 3.1 70B (Spec decoding), Groq: 8k context, Llama 3.1 70B, SambaNova: 64k context
Output Speed: Llama 3.1 Instruct 70B
Output Speed: Output Tokens per Second
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Median across providers: Figures represent median (P50) across all providers which support the model.
Notes: Llama 3.1 70B, Cerebras: 8k context, Llama 3.1 70B (Spec decoding), Groq: 8k context, Llama 3.1 70B, SambaNova: 64k context
Output Speed, Over Time: Llama 3.1 Instruct 70B
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Notes: Llama 3.1 70B, Cerebras: 8k context, Llama 3.1 70B (Spec decoding), Groq: 8k context, Llama 3.1 70B, SambaNova: 64k context
See more information on any of our supported models