Google: Models Quality, Performance & Price
Google Model Comparison Summary
Highlights
Quality vs. Output Speed, Price
Function (Tool) Calling & JSON Mode
Models | Function calling | JSON Mode |
---|---|---|
Gemini 2.0 Flash (exp) (AI Studio), Google | ||
Claude 3.5 Sonnet (Oct) Vertex, Google | ||
Gemini 1.5 Pro (Sep) (Vertex), Google | ||
Gemini 1.5 Pro (Sep) (AI Studio), Google | ||
Claude 3.5 Sonnet (June) Vertex, Google | ||
Gemini 1.5 Flash (Sep) (Vertex), Google | ||
Gemini 1.5 Flash (Sep) (AI Studio), Google | ||
Llama 3.1 405B Vertex, Google | ||
Claude 3 Opus Vertex, Google | ||
Claude 3.5 Haiku Vertex, Google | ||
Llama 3.1 70B Vertex, Google | ||
Llama 3.2 90B (Vision) Vertex, Google | ||
Llama 3.1 8B Vertex, Google | ||
Gemini 1.5 Flash-8B AI Studio, Google | ||
Gemini Experimental (Nov) (AI Studio), Google | ||
Gemini 1.5 Flash (May) (Vertex), Google | ||
Gemini 1.5 Flash (May) (AI Studio), Google | ||
Gemini 1.5 Pro (May) (Vertex), Google | ||
Gemini 1.5 Pro (May) (AI Studio), Google |
Quality & Context Window
Quality Evaluations
Quality vs. Context Window, Input Token Price
Context Window
Pricing
Quality vs. Price
Pricing: Input and Output Prices
Pricing: Cached Inputs
Models | Cache Pricing Notes |
---|---|
Google (AI Studio) |
|
Gemini 1.5 Pro (Sep) (Vertex) |
For >128k tokens:
|
Gemini 1.5 Pro (Sep) (AI Studio) |
For >128k tokens:
|
Gemini 1.5 Flash (Sep) (Vertex) |
For >128k tokens:
|
Gemini 1.5 Flash (Sep) (AI Studio) |
For >128k tokens:
|
Gemini 1.5 Flash-8B AI Studio |
For >128k tokens:
|
Gemini 1.5 Flash (May) (Vertex) |
For >128k tokens:
|
Gemini 1.5 Flash (May) (AI Studio) |
For >128k tokens:
|
Gemini 1.5 Pro (May) (Vertex) |
For >128k tokens:
|
Gemini 1.5 Pro (May) (AI Studio) |
For >128k tokens:
|
Performance Summary
Output Speed vs. Price
Latency vs. Output Speed
Speed
Measured by Output Speed (tokens per second)
Output Speed
Output Speed by Input Token Count (Context Length)
Output Speed Variance
Output Speed, Over Time
Latency
Measured by Time (seconds) to First Token
Latency
Latency by Input Token Count (Context Length)
Latency Variance
Latency, Over Time
Total Response Time
Time to receive 100 tokens output, calculated from latency and output speed metrics
Total Response Time
Total Response Time by Input Token Count (Context Length)
Total Response Time Variance
Total Response Time, Over Time
Features | Price | Output tokens/s | Latency | ||||
---|---|---|---|---|---|---|---|
Further Analysis | |||||||
Llama 3.1 405B Vertex | 128k | 74 | $7.75 | 29.7 | 0.42 | ||
Llama 3.1 70B Vertex | 128k | 68 | $0.00 | 71.7 | 0.28 | ||
Llama 3.2 90B (Vision) Vertex | 128k | 68 | $0.00 | 34.1 | 0.20 | ||
Llama 3.1 8B Vertex | 128k | 54 | $0.00 | 119.7 | 0.18 | ||
Gemini 2.0 Flash (exp) (AI Studio) | 1m | 82 | $0.00 | 169.0 | 0.47 | ||
Gemini 1.5 Pro (Sep) (Vertex) | 2m | 80 | $2.19 | 58.2 | 0.40 | ||
Gemini 1.5 Pro (Sep) (AI Studio) | 2m | 80 | $2.19 | 63.8 | 0.77 | ||
Gemini 1.5 Flash (Sep) (Vertex) | 1m | 74 | $0.13 | 189.9 | 0.21 | ||
Gemini 1.5 Flash (Sep) (AI Studio) | 1m | 74 | $0.13 | 182.1 | 0.41 | ||
Gemini 1.5 Flash-8B AI Studio | 1m | 47 | $0.07 | 279.4 | 0.37 | ||
Gemini Experimental (Nov) (AI Studio) | 2m | $0.00 | 54.5 | 1.25 | |||
Gemini 1.5 Flash (May) (Vertex) | 1m | $0.13 | 302.2 | 0.29 | |||
Gemini 1.5 Flash (May) (AI Studio) | 1m | $0.13 | 313.0 | 0.30 | |||
Gemini 1.5 Pro (May) (Vertex) | 2m | 72 | $2.19 | 65.7 | 0.42 | ||
Gemini 1.5 Pro (May) (AI Studio) | 2m | 72 | $2.19 | 67.1 | 0.79 | ||
Claude 3.5 Sonnet (Oct) Vertex | 200k | 80 | $6.00 | 72.7 | 0.74 | ||
Claude 3.5 Sonnet (June) Vertex | 200k | 76 | $6.00 | 61.3 | 0.73 | ||
Claude 3 Opus Vertex | 200k | 70 | $30.00 | 27.7 | 3.15 | ||
Claude 3.5 Haiku Vertex | 200k | 68 | $1.60 | 65.1 | 0.99 | ||
Gemini 1.0 Pro (AI Studio) | 33k | $0.75 | 102.8 | 1.25 |