Prompt Caching: Cost & Performance Analysis Across Providers
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Compare features and pricing across all major AI providers below.
Caching requires exact prompt matches and varies by provider - some like OpenAI and DeepSeek offer automatic caching, while others including Google, Anthropic, and Amazon require manual setup. Learn more about how it works in our introduction to prompt caching below.
Pricing
Pricing: Input, Cached Hit and Output
Prompt Caching API Specifications
| Provider / Model | Input (standard) | Cache write | Cache hit | Cache storage | Output (standard) | Auto-Enabled | Min tokens | Cache TTL | Notes |
|---|---|---|---|---|---|---|---|---|---|
| |||||||||
| GPT-5.5 (xhigh) | $5.00 | - | $0.50 | - | $30.00 | 1024 | 5-10 minutes | ||
| GPT-5.5 (high) | $5.00 | - | $0.50 | - | $30.00 | 1024 | 5-10 minutes | ||
| GPT-5.5 (medium) | $5.00 | - | $0.50 | - | $30.00 | 1024 | 5-10 minutes | ||
| GPT-5.5 (low) | $5.00 | - | $0.50 | - | $30.00 | 1024 | 5-10 minutes | ||
| GPT-5.5 (Non-reasoning) | $5.00 | - | $0.50 | - | $30.00 | 1024 | 5-10 minutes | ||
| GPT-5.4 mini (xhigh) | $0.75 | - | $0.07 | - | $4.50 | - | - | - | |
| GPT-5.4 nano (xhigh) | $0.20 | - | $0.02 | - | $1.25 | - | - | - | |
| GPT-5.4 nano (medium) | $0.20 | - | $0.02 | - | $1.25 | - | - | - | |
| GPT-5.4 mini (medium) | $0.75 | - | $0.07 | - | $4.50 | - | - | - | |
| GPT-5.4 nano (Non-Reasoning) | $0.20 | - | $0.02 | - | $1.25 | - | - | - | |
| GPT-5.4 mini (Non-Reasoning) | $0.75 | - | $0.07 | - | $4.50 | - | - | - | |
| GPT-5.4 (xhigh) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5.4 (low) | $2.50 | - | $0.25 | - | $15.00 | 1024 | 5-10 minutes | Tiered pricing:
| |
| GPT-5.4 (Non-reasoning) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5.3 Codex (xhigh) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.2 (xhigh) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.2 Codex (xhigh) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.2 (medium) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.2 (Non-reasoning) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.1 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5.1 (Non-reasoning) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 Codex (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (ChatGPT) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.50 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| |||||||||
| Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | |
| Claude Opus 4.7 (Non-reasoning, High Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | |
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, Low Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning, High Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30.0 | |
| Claude 4 Sonnet (Reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 3 Opus | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| |||||||||
| Gemini 3.1 Flash-Lite Preview | $0.25 | - | $0.03 | $1.00 | $1.50 | - | - | ||
| Gemini 3.1 Pro Preview | $2.00 | - | $0.20 | $4.50 | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3.1 Pro Preview | $2.00 | - | $0.20 | $4.50 | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Flash Preview (Reasoning) | $0.50 | - | $0.05 | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 3 Flash Preview (Non-reasoning) | $0.50 | - | $0.05 | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 3 Pro Preview (high) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (high) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (low) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (low) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro Preview (May' 25) | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.0 Flash (Feb '25) | $0.15 | - | $0.03 | $1.00 | $0.60 | 2048 | 60 minutes | ||
| Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | |
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning, High Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 3.7 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Haiku | $1.00 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes | ||
| GLM-5 (Reasoning) | $1.00 | - | $0.10 | - | $3.20 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.56 | - | $0.06 | - | $1.68 | - | - | - | |
-Prompt caching is not 100% guaranteed. | |||||||||
| Grok 4.3 | $1.25 | $1.25 | $0.20 | - | $2.50 | - | - | - | For requests greater than 200k tokens, pricing is $2.50 per 1M input tokens, $0.40 per 1M cached input tokens, and $5.00 per 1M output tokens |
| Grok 4.20 0309 v2 (Reasoning) | $2.00 | - | $0.20 | - | $6.00 | - | - | - | |
| Grok 4.20 0309 v2 (Non-reasoning) | $2.00 | - | $0.20 | - | $6.00 | - | - | - | |
| Grok 4.20 0309 (Reasoning) | $2.00 | - | $0.20 | - | $6.00 | - | - | - | |
| Grok 4.20 0309 (Non-reasoning) | $2.00 | - | $0.20 | - | $6.00 | - | - | - | |
| Grok 4.1 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | - | |
| Grok 4 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok 4 Fast (Non-reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok Code Fast 1 | $0.20 | - | $0.02 | - | $1.50 | - | - | ||
| Grok 4 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.30 | - | $0.07 | - | $0.50 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.60 | - | $0.07 | - | $4.00 | - | - | ||
| Grok 3 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 | $5.00 | - | $0.07 | - | $25.00 | - | - | ||
| |||||||||
| Nova 2.0 Pro Preview (medium) | $1.25 | - | $0.31 | - | $10.00 | - | - | - | |
| Nova Premier | $2.50 | - | $0.62 | - | $12.50 | 1000 | 5 minutes | ||
| Nova Pro | $0.80 | - | $0.20 | - | $3.20 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Lite | $0.06 | - | $0.01 | - | $0.24 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Micro | $0.04 | - | $0.01 | - | $0.14 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | |
| Claude Opus 4.7 (Non-reasoning, High Effort) | $5.00 | - | $0.50 | - | $25.00 | - | - | - | |
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning, High Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $6.25 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Haiku (Non-reasoning) | $1.25 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Sonnet (Reasoning) | $4.12 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $4.12 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4.1 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Opus (Reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Sonnet (Reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Non-reasoning) | $18.75 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Sonnet (Non-reasoning) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 3.5 Haiku | $1.00 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes |
| |
| Claude 3.5 Sonnet (Oct '24) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
| Claude 3.5 Sonnet (June '24) | $3.75 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
OpenAI models:
| |||||||||
| Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | |
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.75 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning, High Effort) | $6.25 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| GPT-5.4 mini (xhigh) | $0.75 | - | $0.07 | - | $4.50 | - | - | - | |
| GPT-5.4 (xhigh) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5.2 (xhigh) | $1.75 | - | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.2 Codex (xhigh) | $1.75 | - | $0.18 | - | $14.00 | - | - | - | |
| GPT-5.1 (high) | $1.25 | - | $0.13 | - | $10.00 | - | - | - | |
| GPT-5 Codex (high) | $1.25 | - | $0.13 | - | $10.00 | - | - | - | |
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| o1-preview | $16.50 | - | $8.25 | - | $66.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| GPT-4o (May '24) | $5.00 | - | $2.50 | - | $15.00 | 1024 | 5-10 minutes | ||
| |||||||||
| DeepSeek V4 Pro (Reasoning, Max Effort) | $1.74 | - | $0.01 | - | $3.48 | 64 | - | ||
| DeepSeek V4 Pro (Reasoning, High Effort) | $1.74 | - | $0.01 | - | $3.48 | 64 | - | ||
| DeepSeek V4 Flash (Reasoning, Max Effort) | $0.14 | - | $0.00 | - | $0.28 | 64 | - | ||
| DeepSeek V4 Flash (Reasoning, High Effort) | $0.14 | - | $0.00 | - | $0.28 | 64 | - | ||
| DeepSeek V4 Pro (Non-reasoning) | $1.74 | - | $0.01 | - | $3.48 | - | - | ||
| DeepSeek V4 Flash (Non-reasoning) | $0.14 | - | $0.00 | - | $0.28 | - | - | ||
| DeepSeek V3.2 (Reasoning) | $0.28 | - | $0.01 | - | $0.42 | - | - | - | |
| DeepSeek V3.2 Exp (Reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
| DeepSeek V3.2 Exp (Non-reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
-Alibaba offers two cache types: implcit and explicit. -Implicit cache is automatically enabled. It is billed at 20% of the standard input token price. -Explicit cache must be activated. It creates a cache for specific content to ensure a deterministic hit within its 5-minute validity period. Tokens used to create the cache are billed at 125% of the standard input token price, while subsequent cache hits are billed at 10% of that price. | |||||||||
| Qwen3.6 Max Preview | $1.30 | $1.63 | $0.13 | - | $7.80 | - | - | - | |
| Qwen3.6 Plus | $0.50 | $0.63 | $0.05 | - | $3.00 | - | - | - | |
| Qwen3 Max Thinking | $1.20 | - | $0.12 | - | $6.00 | - | - | - | |
| NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | $0.30 | - | $0.06 | - | $0.75 | - | - | - | |
| GLM-5 (Reasoning) | $0.95 | - | $0.20 | - | $3.15 | - | - | - | |
| GLM-5 (Non-reasoning) | $0.95 | - | $0.25 | - | $3.15 | - | - | - | |
| GLM-4.7 (Reasoning) | $0.60 | - | $0.12 | - | $2.20 | - | - | - | |
| GLM-4.7 (Non-reasoning) | $0.60 | - | $0.12 | - | $2.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.12 | - | $3.00 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.60 | - | $0.12 | - | $3.00 | - | - | - | |
| DeepSeek V3.1 (Non-reasoning) | $0.50 | - | $0.25 | - | $1.50 | - | - | - | |
-Prefix caching is enabled by deafult. To maximize cache hit rates, a header must be send. | |||||||||
| Kimi K2.6 | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.6 | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.50 | - | $0.10 | - | $2.85 | - | - | - | |
| Gemma 4 31B (Reasoning) | $0.30 | - | $0.13 | - | $1.25 | - | - | - | |
| GPT-5.2 (xhigh) | $1.75 | $1.75 | $0.17 | - | $14.00 | - | - | - | |
| GPT-5.1 (high) | $1.25 | $1.25 | $0.13 | - | $10.00 | - | - | - | |
| GPT-5 (high) | $1.25 | $1.25 | $0.13 | - | $10.00 | - | - | - | |
| Gemini 3 Pro Preview (high) | $2.00 | $2.50 | $0.25 | - | $12.00 | - | - | - | |
-Prompt caching is automatic — no extra parameters required. | |||||||||
| DeepSeek V4 Pro (Reasoning, Max Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, High Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, Max Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, High Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.26 | - | $0.13 | - | $0.38 | - | - | - | |
| DeepSeek V3.2 (Non-reasoning) | $0.26 | - | $0.13 | - | $0.38 | - | - | - | |
| DeepSeek V3.1 Terminus (Non-reasoning) | $0.21 | - | $0.13 | - | $0.79 | - | - | - | |
| DeepSeek V3.1 (Non-reasoning) | $0.21 | - | $0.13 | - | $0.79 | - | - | - | |
| DeepSeek R1 0528 (May '25) | $0.50 | - | $0.35 | - | $2.15 | - | - | - | |
| DeepSeek V3 0324 | $0.20 | - | $0.13 | - | $0.77 | - | - | - | |
| Kimi K2.6 | $0.75 | - | $0.15 | - | $3.50 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.45 | - | $0.07 | - | $2.25 | - | - | - | |
| Qwen3.6 35B A3B (Non-reasoning) | $0.19 | - | $0.20 | - | $1.00 | - | - | - | |
| Qwen3.5 122B A10B (Reasoning) | $0.29 | - | $0.14 | - | $2.90 | - | - | - | |
| Qwen3.5 35B A3B (Reasoning) | $0.22 | - | $0.10 | - | $2.20 | - | - | - | |
| Qwen3.5 122B A10B (Non-reasoning) | $0.29 | - | $0.14 | - | $2.40 | - | - | - | |
| Qwen3.5 35B A3B (Non-reasoning) | $0.18 | - | $0.10 | - | $1.00 | - | - | - | |
| Qwen3.5 397B A17B (Reasoning) | $0.54 | - | $0.27 | - | $3.40 | - | - | - | |
| Qwen3.5 397B A17B (Non-reasoning) | $0.54 | - | $0.27 | - | $3.40 | - | - | - | |
| Qwen3 235B A22B 2507 (Reasoning) | $0.23 | - | $0.20 | - | $2.30 | - | - | - | |
| Qwen3 Coder 480B A35B Instruct | $0.30 | - | $0.02 | - | $1.00 | - | - | - | |
| Qwen2.5 Instruct 72B | $0.36 | - | $0.12 | - | $0.40 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.05 | - | $0.26 | - | $3.50 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $0.80 | - | $0.16 | - | $2.56 | - | - | - | |
| GLM-5 (Non-reasoning) | $0.60 | - | $0.12 | - | $2.08 | - | - | - | |
| GLM-4.7-Flash (Reasoning) | $0.06 | - | $0.01 | - | $0.40 | - | - | - | |
| GLM-4.7 (Reasoning) | $0.40 | - | $0.08 | - | $1.75 | - | - | - | |
| GLM-4.7 (Non-reasoning) | $0.40 | - | $0.08 | - | $1.75 | - | - | - | |
| Gemma 4 31B (Reasoning) | $0.13 | - | $0.02 | - | $0.38 | - | - | - | |
| MiniMax-M2.5 | $0.15 | - | $0.03 | - | $1.15 | - | - | - | |
-Prompt caching is enabled by default. -The default discount is 50%, but the exact discount varies by model. | |||||||||
| DeepSeek V4 Pro (Reasoning, Max Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, High Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | ||
| DeepSeek V3.2 (Reasoning) | $0.56 | - | $0.28 | - | $1.68 | - | - | - | |
| Kimi K2.6 | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.45 | - | $0.07 | - | $2.25 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.45 | - | $0.10 | - | $2.25 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| GLM-5 (Non-reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| MiniMax-M2.5 | $0.30 | - | $0.03 | - | $1.20 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $1.00 | - | $0.50 | - | $3.20 | - | - | - | |
| MiniMax-M2.5 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.60 | - | $3.00 | - | - | - | |
-The minimum cacheable prompt length varies by model, ranging from 128 to 1024 tokens depending on the specific model used. -All cached data automatically expires after 2 hours without use. | |||||||||
| gpt-oss-120B (high) | $0.15 | - | $0.07 | - | $0.60 | - | 2 hrs | ||
| Mercury 2 | $0.25 | - | $0.03 | - | $0.75 | - | - | - | |
| Kimi K2.6 | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.6 (Non-reasoning) | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| Kimi K2 Thinking | $1.15 | - | $0.15 | - | $8.00 | - | - | - | |
| Kimi K2 Thinking | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
| Kimi K2 | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
-MiniMax supports Anthropic API compatible caching that is managed through explicit cache_control settings. | |||||||||
| MiniMax-M2.7 | $0.30 | $0.38 | $0.06 | - | $1.20 | - | - | - | |
| MiniMax-M2.5 | $0.30 | $0.38 | $0.03 | - | $1.20 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, Max Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, High Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, Max Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, High Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.27 | - | $0.13 | - | $0.40 | - | - | - | |
| Ling 2.6 Flash | $0.10 | - | $0.02 | - | $0.30 | - | - | - | |
| Kimi K2.6 | $0.95 | - | $0.16 | - | $4.00 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| GLM-5 (Non-reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| MiniMax-M2.5 | $0.30 | - | $0.03 | - | $1.20 | - | - | - | |
| Qwen3.5 35B A3B (Reasoning) | $0.25 | - | $0.25 | - | $2.00 | - | - | - | |
| KAT-Coder-Pro V1 | $0.30 | - | $0.04 | - | $1.20 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, Max Effort) | $0.14 | - | $0.07 | - | $0.28 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, High Effort) | $0.14 | - | $0.07 | - | $0.28 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.28 | - | $0.13 | - | $0.45 | - | - | - | |
| Kimi K2.6 | $0.60 | - | $0.20 | - | $2.80 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.50 | - | $0.20 | - | $2.50 | - | - | - | |
| Qwen3.6 35B A3B (Reasoning) | $0.35 | - | $0.20 | - | $2.00 | - | - | - | |
| Qwen3.6 35B A3B (Non-reasoning) | $0.35 | - | $0.20 | - | $2.00 | - | - | - | |
| Qwen3.5 397B A17B (Reasoning) | $0.60 | - | $0.30 | - | $3.60 | - | - | - | |
| Qwen3 Coder Next | $0.15 | - | $0.10 | - | $0.80 | - | - | - | |
| Qwen3 VL 235B A22B Instruct | $0.21 | - | $0.10 | - | $1.90 | - | - | - | |
| Qwen3 Next 80B A3B Instruct | $0.25 | - | $0.07 | - | $1.10 | - | - | - | |
| Qwen3 235B A22B 2507 Instruct | $0.15 | - | $0.05 | - | $0.85 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| GLM-4.7 (Reasoning) | $0.45 | - | $0.11 | - | $2.10 | - | - | - | |
| GLM-4.7 (Non-reasoning) | $0.45 | - | $0.11 | - | $2.10 | - | - | - | |
| Gemma 4 31B (Reasoning) | $0.14 | - | $0.07 | - | $0.40 | - | - | - | |
| Gemma 4 26B A4B (Reasoning) | $0.13 | - | $0.05 | - | $0.40 | - | - | - | |
| Gemma 3 27B Instruct | $0.25 | - | $0.04 | - | $0.40 | - | - | - | |
| Trinity Large Thinking | $0.22 | - | $0.06 | - | $0.85 | - | - | - | |
| MiniMax-M2.5 | $0.17 | - | $0.03 | - | $1.20 | - | - | - | |
| gpt-oss-120B (high) | $0.10 | - | $0.06 | - | $0.75 | - | - | - | |
| gpt-oss-120B (low) | $0.15 | - | $0.06 | - | $0.60 | - | - | - | |
| Llama 4 Maverick | $0.35 | - | $0.17 | - | $0.85 | - | - | - | |
| Llama 3.3 Instruct 70B | $0.22 | - | $0.11 | - | $0.50 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, Max Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | ||
| DeepSeek V4 Pro (Reasoning, High Effort) | $1.74 | - | $0.14 | - | $3.48 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, Max Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V4 Flash (Reasoning, High Effort) | $0.14 | - | $0.03 | - | $0.28 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.27 | - | $0.14 | - | $0.42 | - | - | - | |
| DeepSeek V3.2 (Non-reasoning) | $0.27 | - | $0.14 | - | $0.42 | - | - | - | |
| Kimi K2.6 | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.23 | - | $0.07 | - | $3.00 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5 (Reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| GLM-5 (Non-reasoning) | $1.00 | - | $0.20 | - | $3.20 | - | - | - | |
| GLM-4.7 (Reasoning) | $0.42 | - | $0.11 | - | $2.20 | - | - | - | |
| GLM-4.7 (Non-reasoning) | $0.42 | - | $0.11 | - | $2.20 | - | - | - | |
| MiniMax-M2.5 | $0.20 | - | $0.03 | - | $1.00 | - | - | - | |
| KAT Coder Pro V2 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, Max Effort) | $2.10 | - | $0.20 | - | $4.40 | - | - | - | |
| DeepSeek V4 Pro (Reasoning, High Effort) | $2.10 | - | $0.20 | - | $4.40 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| MiniMax-M2.5 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| Qwen3.5 397B A17B (Reasoning) | $0.60 | - | $0.60 | - | $3.60 | - | - | - | |
| MiMo-V2.5-Pro | $1.00 | - | $0.20 | - | $3.00 | - | - | - | |
| MiMo-V2.5 | $0.40 | - | $0.08 | - | $2.00 | - | - | - | |
| MiMo-V2.5-Pro (Non-reasoning) | $1.00 | - | $0.20 | - | $3.00 | - | - | - | |
| MiMo-V2-Omni-0327 | $0.40 | - | $0.08 | - | $2.00 | - | - | - | |
| MiMo-V2-Flash (Feb 2026) | $0.10 | - | $0.01 | - | $0.30 | - | - | - | |
Introduction to Prompt Caching
What is Prompt Caching?
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Getting your approach to caching right can deliver huge cost savings on input tokens and meaningful performance benefits.
When you send a prompt, the system first checks if that exact prompt has been processed before. If found (cache hit), it returns the stored response instead of generating a new one. If not found (cache miss), the prompt is processed normally, and the response is stored for future use.
Key Metrics to Watch
- Input Price: The standard price you pay for input tokens
- Cache Write Price: What you pay to save prompt tokens into the cache; sometimes higher than standard input pricing
- Cache Hit Price: Discounted rate for prompt tokens that hit the cache
- Cache Storage Price: Hourly cost per million cached tokens (currently unique to Google)
- Cache TTL: The time cached tokens remain available, ranging from hours to days
- Cache Minimum Tokens: Minimum matching token count required before a cache hit is served
How Does Prompt Caching Work?
When you send a prompt to a transformer-based language model, the attention layers process each input token into key (K) and value (V) vectors that are stored in the KV cache. By keeping these values in memory, processing on input tokens can be avoided when identical input tokens are sent into the model again.
Until recently, leveraging the speed and cost benefits of caching was only available for dedicated deployments. Now, serverless API providers—including the frontier labs—have begun passing on some of the cost benefits of caching to developers.
Optimal Use Cases
- System instructions: Large system prompts that must be included across many interactions
- Chat history: Conversation context that accompanies each new user turn
- Per-user personalized context: Extensive user memories or profiles that enable deep personalization
Implementation Considerations
- Activation method varies by provider - some require manual setup while others offer automatic caching
- Cache hit discounts range from 50-90% off standard input token pricing - this really is worth the time to get right
- Caching improves performance for very long prompts (50k+ tokens); benchmarks coming soon