logo

Gemini 1.5 Flash: Quality, Performance & Price Analysis

Analysis of Google's Gemini 1.5 Flash and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more. Click on any model to compare API providers for that model. For more details including relating to our methodology, see our FAQs.
For analysis of API providers see
Creator:
Google
License:
Proprietary
Context window:
1m
Link:

Comparison Summary

Quality:
Gemini 1.5 Flash is of higher quality compared to average, with a MMLU score of 0.789 and a Quality Index across evaluations of 83.
Price:
Gemini 1.5 Flash is cheaper compared to average with a price of $0.53 per 1M Tokens (blended 3:1).
Gemini 1.5 Flash Input token price: $0.35, Output token price: $1.05 per 1M Tokens.
Speed:
Gemini 1.5 Flash is faster compared to average, with a output speed of 139.5 tokens per second.
Latency:
Gemini 1.5 Flash has a higher latency compared to average, taking 1.34s to receive the first token (TTFT).
Context Window:
Gemini 1.5 Flash has a larger context windows than average, with a context window of 1.0M tokens.

Highlights

Quality
Quality Index; Higher is better
Speed
Output Tokens per Second; Higher is better
Price
USD per 1M Tokens; Lower is better
Parallel Queries:
Prompt Length:

Quality vs. Output Speed, Price

+ Add model from specific provider
Quality: General reasoning index; Output Speed: Output Tokens per Second; Price: Price: USD per 1M Tokens
Most attractive quadrant
Size represents Price (USD per M Tokens)
There is a trade-off between model quality and output speed, with higher quality models typically having lower output speed.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Quality & Context window

Quality comparison by ability

+ Add model from specific provider
Varied metrics by ability categorization; Higher is better
General Ability (Chatbot Arena)
Reasoning & Knowledge (MMLU)
Reasoning & Knowledge (MT Bench)
Coding (HumanEval)
Different use-cases warrant considering different evaluation tests. Chatbot Arena is a good evaluation of communication abilities while MMLU tests reasoning and knowledge more comprehensively.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.

Quality vs. Context window, Input token price

+ Add model from specific provider
Quality: General reasoning index; Context window: Tokens limit; Input Price: USD per 1M Input Tokens
Most attractive quadrant
Size represents Input Price (USD per M Input Tokens)
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Context window

+ Add model from specific provider
Context window: Tokens limit; Higher is better
Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.
Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Quality vs. Price

+ Add model from specific provider
While higher quality models are typically more expensive, they do not all follow the same price-quality curve.
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.

Pricing: Input and Output prices

+ Add model from specific provider
Price: USD per 1M Tokens
Input price
Output price
Prices vary considerably, including between input and output token price. Prices can vary by orders of magnitude (>10X) between the more expensive and cheapest models.
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
Median across providers: Figures represent median (P50) across all providers which support the model.

 Pricing comparison of Gemini 1.5 Flash API providers

Performance summary

Output Speed vs. Price

+ Add model from specific provider
There is a trade-off between model quality and output speed, with higher quality models typically having lower output speed.
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).

Latency vs. Output Speed

+ Add model from specific provider
Latency: Seconds to First Tokens Chunk Received; Output Speed: Output Tokens per Second
Most attractive quadrant
Size represents Price (USD per M Tokens)
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Latency: Time to first token of tokens received, in seconds, after API request sent.
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Median across providers: Figures represent median (P50) across all providers which support the model.

Speed

Measured by Output Speed (tokens per second)

Output Speed

+ Add model from specific provider
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Median across providers: Figures represent median (P50) across all providers which support the model.

Output Speed by Input token (context) length

+ Add model from specific provider
Output Tokens per Second; Higher is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Median across providers: Figures represent median (P50) across all providers which support the model.

Output Speed Variance

+ Add model from specific provider
Output Tokens per Second; Results by percentile; Higher is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Boxplot: Shows variance of measurements
Picture of the author

Output Speed, Over Time

+ Add model from specific provider
Output Tokens per Second; Higher is better
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.

Latency

Measured by Time (seconds) to First Token

Latency

+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Median across providers: Figures represent median (P50) across all providers which support the model.

Latency by Input token (context) length

+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Latency: Time to first token of tokens received, in seconds, after API request sent.
Median across providers: Figures represent median (P50) across all providers which support the model.

Latency Variance

+ Add model from specific provider
Seconds to First Tokens Chunk Received; Results by percentile; Lower is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Latency: Time to first token of tokens received, in seconds, after API request sent.
Boxplot: Shows variance of measurements
Picture of the author

Latency, Over Time

+ Add model from specific provider
Seconds to First Tokens Chunk Received; Lower median is better
Latency: Time to first token of tokens received, in seconds, after API request sent.
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.

Total Response Time

Time to receive 100 tokens output, calculated by latency and output speed metrics

Total Response Time

+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
The speed difference between the fastest and slowest models is >3X. There is not always a correlation between parameter size and speed, or between price and speed.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.

Total Response Time by Input token (context) length

+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
Short (100 tokens)
Medium (1,000 tokens)
Long (10,000 tokens)
Input Tokens Length: Length of tokens provided in the request. See Prompt Options above to see benchmarks of different input prompt lengths across other charts.
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Median across providers: Figures represent median (P50) across all providers which support the model.

Total Response Time Variance

+ Add model from specific provider
Total: Response Time: Seconds to Output 100 Tokens; Results by percentile; Lower is better
Median, Other points represent 5th, 25th, 75th, 95th Percentiles respectively
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Boxplot: Shows variance of measurements
Picture of the author

Total Response Time, Over Time

+ Add model from specific provider
Seconds to Output 100 Tokens; Lower is better
Total Response Time: Time to receive a 100 token response. Estimated based on Latency (time to receive first chunk) and Output Speed (output tokens per second).
Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.
Median across providers: Figures represent median (P50) across all providers which support the model.
Further details
Model NameFurther analysis
OpenAI logo
OpenAI logoGPT-4o
OpenAI logoGPT-4 Turbo
OpenAI logoGPT-4
OpenAI logoGPT-3.5 Turbo
OpenAI logoGPT-3.5 Turbo Instruct
Google logo
Google logoGemini 1.5 Flash
Google logoGemini 1.5 Pro
Google logoGemini 1.0 Pro
Google logoGemma 7B Instruct
Meta logo
Meta logoLlama 3 Instruct (70B)
Meta logoLlama 3 Instruct (8B)
Meta logoCode Llama Instruct (70B)
Meta logoLlama 2 Chat (70B)
Meta logoLlama 2 Chat (13B)
Meta logoLlama 2 Chat (7B)
AI21 Labs logo
AI21 Labs logoJamba Instruct
Mistral logo
Mistral logoMixtral 8x22B Instruct
Mistral logoMistral Large
Mistral logoMistral Medium
Mistral logoMistral Small
Mistral logoMixtral 8x7B Instruct
Mistral logoMistral 7B Instruct
Anthropic logo
Anthropic logoClaude 3.5 Sonnet
Anthropic logoClaude 3 Opus
Anthropic logoClaude 3 Sonnet
Anthropic logoClaude 3 Haiku
Anthropic logoClaude 2.0
Anthropic logoClaude 2.1
Anthropic logoClaude Instant
Alibaba logo
Alibaba logoQwen2 Instruct (72B)
Cohere logo
Cohere logoCommand Light
Cohere logoCommand
Cohere logoCommand-R+
Cohere logoCommand-R
OpenChat logo
OpenChat logoOpenChat 3.5 (1210)
Databricks logo
Databricks logoDBRX Instruct
DeepSeek logo
DeepSeek logoDeepSeek-V2-Chat
Snowflake logo
Snowflake logoArctic Instruct