Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

LLM API Providers Leaderboard - Comparison of over 500 AI Model endpoints

Comparison of API provider performance across over 500 AI Model endpoints, including from OpenAI, Google, DeepSeek and others, across performance key metrics including price, output speed, latency, context window and more. For more details including relating to our methodology, see our FAQs.

Key definitions

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Time to first token received, in seconds, after API request sent. For reasoning models which share reasoning tokens, this will be the first reasoning token. For models which do not support streaming, this represents time to receive the completion.

Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Metrics are 'live' and are based on the past 72 hours of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.