GPT-4o: Quality, Performance & Price Analysis
Analysis of OpenAI's GPT-4o and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
Comparison Summary
Quality:
GPT-4o is of higher quality compared to average, with a MMLU score of 0.887 and a Quality Index across evaluations of 100.
Price:GPT-4o is more expensive compared to average with a price of $7.50 per 1M Tokens (blended 3:1).
GPT-4o Input token price: $5.00, Output token price: $15.00 per 1M Tokens.
Speed:GPT-4o Input token price: $5.00, Output token price: $15.00 per 1M Tokens.
GPT-4o is faster compared to average, with a output speed of 86.9 tokens per second.
Latency:GPT-4o has a lower latency compared to average, taking 0.53s to receive the first token (TTFT).
Context Window:GPT-4o has a larger context windows than average, with a context window of 130k tokens.
Highlights
Quality
Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:
Quality vs. Output Speed, Price
+ Add model from specific provider
Quality: General reasoning index; Output Speed: Output Tokens per Second; Price: Price: USD per 1M Tokens
Most attractive quadrant
Size represents Price (USD per M Tokens)
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Quality & Context window
Quality comparison by ability
+ Add model from specific provider
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Coding (HumanEval)
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Quality vs. Context window, Input token price
+ Add model from specific provider
Quality: General reasoning index; Context window: Tokens limit; Input Price: USD per 1M Input Tokens
Most attractive quadrant
Size represents Input Price (USD per M Input Tokens)
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Context window
+ Add model from specific provider
Context window: Tokens limit; Higher is better
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Pricing
Quality vs. Price
+ Add model from specific provider
Quality: General reasoning index; Price: Price: USD per 1M Tokens
Most attractive quadrant
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing: Input and Output prices
+ Add model from specific provider
Price: USD per 1M Tokens
Input price
Output price
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.
Pricing comparison of GPT-4o API providers
Performance summary
Output Speed vs. Price
+ Add model from specific provider
Output Speed: Output Tokens per Second; Price: Price: USD per 1M Tokens
Most attractive quadrant
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Latency vs. Output Speed
+ Add model from specific provider
Latency: Seconds to First Tokens Chunk Received; Output Speed: Output Tokens per Second
Most attractive quadrant
Size represents Price (USD per M Tokens)
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.
Speed
Measured by Output Speed (tokens per second)
Output Speed
+ Add model from specific provider
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median across providers: Figures represent median (P50) across all providers which support the model.
Output Speed by Input token (context) length
+ Add model from specific provider
Output Tokens per Second; Higher is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Median across providers: Figures represent median (P50) across all providers which support the model.
Output Speed Variance
+ Add model from specific provider
Output Tokens per Second; Results by percentile; Higher is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Boxplot: Shows variance of measurements
![Picture of the author](/_next/image?url=%2Fimg%2Fgeneral-frontend%2Fboxplot-diagram.png&w=640&q=75)
Output Speed, Over Time
+ Add model from specific provider
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency
Measured by Time (seconds) to First Token
Latency
+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency by Input token (context) length
+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Latency: Time to first token of tokens received, in seconds, after API request sent.
Median across providers: Figures represent median (P50) across all providers which support the model.
Latency Variance
+ Add model from specific provider
Seconds to First Tokens Chunk Received; Results by percentile; Lower is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Latency: Time to first token of tokens received, in seconds, after API request sent.
Boxplot: Shows variance of measurements
![Picture of the author](/_next/image?url=%2Fimg%2Fgeneral-frontend%2Fboxplot-diagram.png&w=640&q=75)
Latency, Over Time
+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower median is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time
Time to receive 100 tokens output, calculated by latency and output speed metrics
Total Response Time
+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time by Input token (context) length
+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.
Total Response Time Variance
+ Add model from specific provider
Total: Response Time: Seconds to Output 100 Tokens; Results by percentile; Lower is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Boxplot: Shows variance of measurements
![Picture of the author](/_next/image?url=%2Fimg%2Fgeneral-frontend%2Fboxplot-diagram.png&w=640&q=75)
Total Response Time, Over Time
+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Further details