Prompt Caching: Cost & Performance Analysis Across Providers
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Compare features and pricing across all major AI providers below.
Caching requires exact prompt matches and varies by provider - some like OpenAI and DeepSeek offer automatic caching, while others including Google, Anthropic, and Amazon require manual setup. Learn more about how it works in our introduction to prompt caching below.
Pricing
Pricing: Cached Input Prompts
Cache Hit Discount
Prompt Caching API Specifications
| Provider / Model | Input (standard) | Cache write | Cache hit | Cache storage | Output (standard) | Auto-Enabled | Min tokens | Cache TTL | Notes |
|---|---|---|---|---|---|---|---|---|---|
| |||||||||
| GPT-5.4 (xhigh) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5.4 (Non-reasoning) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5.1 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5.1 (Non-reasoning) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 Codex (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (ChatGPT) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.50 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| |||||||||
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, Low Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30.0 | |
| Claude 3 Haiku | $0.25 | $0.30 | $0.03 | - | $1.25 | 2048 | 5 minutes | ||
| Claude 3 Opus | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| |||||||||
| Gemini 3.1 Flash-Lite Preview | $0.25 | - | $0.03 | $1.00 | $1.50 | - | - | ||
| Gemini 3.1 Pro Preview | $2.00 | - | $0.20 | $4.50 | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3.1 Pro Preview | $2.00 | - | $0.20 | $4.50 | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Flash Preview (Reasoning) | $0.50 | - | - | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 3 Flash Preview (Non-reasoning) | $0.50 | - | - | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 3 Pro Preview (high) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (high) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (low) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 3 Pro Preview (low) | $2.00 | - | $0.20 | - | $12.00 | - | - | Tiered pricing:
| |
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro Preview (May' 25) | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.0 Flash (Feb '25) | $0.15 | - | $0.03 | $1.00 | $0.60 | 2048 | 60 minutes | ||
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 3.7 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Haiku | $0.80 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes | ||
| DeepSeek V3.2 (Reasoning) | $0.56 | - | $0.06 | - | $1.68 | - | - | - | |
| Grok 4.1 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | - | |
| Grok 4 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok 4 Fast (Non-reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok Code Fast 1 | $0.20 | - | $0.02 | - | $1.50 | - | - | ||
| Grok 4 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.30 | - | $0.07 | - | $0.50 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.60 | - | $0.07 | - | $4.00 | - | - | ||
| Grok 3 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 | $5.00 | - | $0.07 | - | $25.00 | - | - | ||
| |||||||||
| Nova Premier | $2.50 | - | $0.62 | - | $12.50 | 1000 | 5 minutes | ||
| Nova Pro | $0.80 | - | $0.20 | - | $3.20 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Lite | $0.06 | - | $0.01 | - | $0.24 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Micro | $0.04 | - | $0.01 | - | $0.14 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 3.7 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.7 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Haiku | $0.80 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes |
| |
| Claude 3.5 Sonnet (Oct '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
| Claude 3.5 Sonnet (June '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
OpenAI models:
| |||||||||
| GPT-5.4 (xhigh) | $2.50 | - | $0.25 | - | $15.00 | - | - | - | Tiered pricing:
|
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| o1-preview | $16.50 | - | $8.25 | - | $66.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| GPT-4o (May '24) | $5.00 | - | $2.50 | - | $15.00 | 1024 | 5-10 minutes | ||
| Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Sonnet 4.6 (Non-reasoning, High Effort) | $3.00 | $3.75 | $0.30 | - | $15.00 | - | - | - | 1h cache write: $6 |
| Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| Claude Opus 4.6 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | - | - | - | 1h cache write: $10 |
| |||||||||
| DeepSeek V3.2 (Reasoning) | $0.28 | - | $0.01 | - | $0.42 | - | - | - | |
| DeepSeek V3.2 Exp (Reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
| DeepSeek V3.2 Exp (Non-reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
| NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | $0.30 | - | $0.06 | - | $0.75 | - | - | - | |
| GLM-5 (Reasoning) | $0.95 | - | $0.20 | - | $3.15 | - | - | - | |
| GLM-5 (Non-reasoning) | $0.95 | - | $0.25 | - | $3.15 | - | - | - | |
| GLM-4.7 (Reasoning) | $0.60 | - | $0.12 | - | $2.20 | - | - | - | |
| GLM-4.7 (Non-reasoning) | $0.60 | - | $0.12 | - | $2.20 | - | - | - | |
| GLM-4.6 (Reasoning) | $0.60 | - | $0.12 | - | $2.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.12 | - | $3.00 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.60 | - | $0.12 | - | $3.00 | - | - | - | |
| DeepSeek V3.1 (Non-reasoning) | $0.50 | - | $0.25 | - | $1.50 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| Gemma 4 31B (Reasoning) | $0.13 | - | $0.02 | - | $0.38 | - | - | - | |
| Qwen3.5 397B A17B (Reasoning) | $0.54 | - | $0.27 | - | $3.40 | - | - | - | |
| Qwen3.5 397B A17B (Non-reasoning) | $0.54 | - | $0.27 | - | $3.40 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.45 | - | $0.07 | - | $2.25 | - | - | - | |
| DeepSeek V3.2 (Non-reasoning) | $0.26 | - | $0.13 | - | $0.38 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.45 | - | $0.07 | - | $2.25 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.56 | - | $0.28 | - | $1.68 | - | - | - | |
| gpt-oss-120B (high) | $0.15 | - | $0.07 | - | $0.60 | - | - | - | |
| Mercury 2 | $0.25 | - | $0.03 | - | $0.75 | - | - | - | |
| Kimi K2 Thinking | $1.15 | - | $0.15 | - | $8.00 | - | - | - | |
| Kimi K2 Thinking | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
| Kimi K2 | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
| MiniMax-M2.7 | $0.30 | $0.38 | $0.06 | - | $1.20 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| Kimi K2.5 (Non-reasoning) | $0.60 | - | $0.10 | - | $3.00 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.27 | - | $0.13 | - | $0.40 | - | - | - | |
| KAT-Coder-Pro V1 | $0.30 | - | $0.04 | - | $1.20 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| Gemma 4 31B (Reasoning) | $0.14 | - | $0.07 | - | $0.40 | - | - | - | |
| Qwen3.5 397B A17B (Reasoning) | $0.60 | - | $0.30 | - | $3.60 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.50 | - | $0.20 | - | $2.50 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.28 | - | $0.13 | - | $0.45 | - | - | - | |
| gpt-oss-120B (high) | $0.10 | - | $0.06 | - | $0.75 | - | - | - | |
| gpt-oss-120B (low) | $0.15 | - | $0.06 | - | $0.60 | - | - | - | |
| GLM-5.1 (Reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| GLM-5.1 (Non-reasoning) | $1.40 | - | $0.26 | - | $4.40 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.23 | - | $0.07 | - | $3.00 | - | - | - | |
| DeepSeek V3.2 (Reasoning) | $0.27 | - | $0.14 | - | $0.42 | - | - | - | |
| MiniMax-M2.7 | $0.30 | - | $0.06 | - | $1.20 | - | - | - | |
| Kimi K2.5 (Reasoning) | $0.50 | - | $0.10 | - | $2.85 | - | - | - | |
Introduction to Prompt Caching
What is Prompt Caching?
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Getting your approach to caching right can deliver huge cost savings on input tokens and meaningful performance benefits.
When you send a prompt, the system first checks if that exact prompt has been processed before. If found (cache hit), it returns the stored response instead of generating a new one. If not found (cache miss), the prompt is processed normally, and the response is stored for future use.
Key Metrics to Watch
- Input Price: The standard price you pay for input tokens
- Cache Write Price: What you pay to save prompt tokens into the cache; sometimes higher than standard input pricing
- Cache Hit Price: Discounted rate for prompt tokens that hit the cache
- Cache Storage Price: Hourly cost per million cached tokens (currently unique to Google)
- Cache TTL: The time cached tokens remain available, ranging from hours to days
- Cache Minimum Tokens: Minimum matching token count required before a cache hit is served
How Does Prompt Caching Work?
When you send a prompt to a transformer-based language model, the attention layers process each input token into key (K) and value (V) vectors that are stored in the KV cache. By keeping these values in memory, processing on input tokens can be avoided when identical input tokens are sent into the model again.
Until recently, leveraging the speed and cost benefits of caching was only available for dedicated deployments. Now, serverless API providers—including the frontier labs—have begun passing on some of the cost benefits of caching to developers.
Optimal Use Cases
- System instructions: Large system prompts that must be included across many interactions
- Chat history: Conversation context that accompanies each new user turn
- Per-user personalized context: Extensive user memories or profiles that enable deep personalization
Implementation Considerations
- Activation method varies by provider - some require manual setup while others offer automatic caching
- Cache hit discounts range from 50-90% off standard input token pricing - this really is worth the time to get right
- Caching improves performance for very long prompts (50k+ tokens); benchmarks coming soon