Prompt Caching: Cost & Performance Analysis Across Providers
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Compare features and pricing across all major AI providers below.
Caching requires exact prompt matches and varies by provider - some like OpenAI and DeepSeek offer automatic caching, while others including Google, Anthropic, and Amazon require manual setup. Learn more about how it works in our introduction to prompt caching below.
Pricing: Cached Input Prompts
Price per token included in the request/message sent to the API, represented as USD per million Tokens.
One-time cost charged when storing a prompt in the cache for future reuse, represented as USD per million tokens.
Price per token for cached prompts (previously processed), typically offering a significant discount compared to regular input price, represented as USD per million tokens.
Cost to maintain tokens in cache storage, charged per million tokens per hour. Currently only applicable to Google's Gemini models.
Price per token generated by the model (received from the API), represented as USD per million Tokens.
Cache Hit Discount
Reduction in input token cost due to cache hit relative to input price. Formula: (Input Token Price - Cache Hit Price per Token) / Input Token Price. Please note that this discount figure does not take into account all costs associated with cache hits, such as cache write and storage costs.
Prompt Caching API Specifications
| Provider / Model | Input (standard) | Cache write | Cache hit | Cache storage | Output (standard) | Auto-Enabled | Min tokens | Cache TTL | Notes |
|---|---|---|---|---|---|---|---|---|---|
| |||||||||
| GPT-5.1 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5.1 (Non-reasoning) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 Codex (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (ChatGPT) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.50 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| |||||||||
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30.0 | |
| Claude 3.5 Haiku | $0.80 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes | ||
| Claude 3 Haiku | $0.25 | $0.30 | $0.03 | - | $1.25 | 2048 | 5 minutes | ||
| Claude 3 Opus | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| |||||||||
| Gemini 3 Flash Preview (Reasoning) | $0.50 | - | - | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 3 Flash Preview (Non-reasoning) | $0.50 | - | - | $1.00 | $3.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash-Lite (Non-reasoning) | $0.10 | - | $0.01 | $1.00 | $0.40 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Flash (Non-reasoning) | $0.30 | - | $0.03 | $1.00 | $2.50 | 2048 | 60 minutes | ||
| Gemini 2.5 Pro Preview (May' 25) | $1.25 | - | $0.13 | $4.50 | $10.00 | 2048 | 60 minutes | ||
| Gemini 2.0 Flash (Feb '25) | $0.15 | - | $0.03 | $1.00 | $0.60 | 2048 | 60 minutes | ||
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | 1h cache write: $2 | |
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | 1h cache write: $30 | |
| Claude 3.7 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Haiku | $0.80 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes | ||
| Claude 3.5 Sonnet (Oct '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Sonnet (June '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Grok 4.1 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | - | |
| Grok 4 Fast (Reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok 4 Fast (Non-reasoning) | $0.20 | - | $0.05 | - | $0.50 | - | - | ||
| Grok Code Fast 1 | $0.20 | - | $0.02 | - | $1.50 | - | - | ||
| Grok 4 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.30 | - | $0.07 | - | $0.50 | - | - | ||
| Grok 3 mini Reasoning (high) | $0.60 | - | $0.07 | - | $4.00 | - | - | ||
| Grok 3 | $3.00 | - | $0.75 | - | $15.00 | - | - | ||
| Grok 3 | $5.00 | - | $0.07 | - | $25.00 | - | - | ||
| |||||||||
| Nova Premier | $2.50 | - | $0.62 | - | $12.50 | 1000 | 5 minutes | ||
| Nova Pro | $0.80 | - | $0.20 | - | $3.20 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Lite | $0.06 | - | $0.01 | - | $0.24 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Nova Micro | $0.04 | - | $0.01 | - | $0.14 | 1000 | 5 minutes | Only supported on US East (N. Virginia) region | |
| Claude Opus 4.5 (Reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude Opus 4.5 (Non-reasoning) | $5.00 | $6.25 | $0.50 | - | $25.00 | 1024 | 5 minutes | 1h cache write: $10 | |
| Claude 4.5 Haiku (Reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Haiku (Non-reasoning) | $1.00 | $1.25 | $0.10 | - | $5.00 | 4096 | 5 minutes | ||
| Claude 4.5 Sonnet (Reasoning) | $3.00 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.5 Sonnet (Non-reasoning) | $3.00 | $4.12 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4.1 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4.1 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Tiered pricing:
| |
| Claude 4 Opus (Reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 4 Opus (Non-reasoning) | $15.00 | $18.75 | $1.50 | - | $75.00 | 1024 | 5 minutes | ||
| Claude 3.7 Sonnet (Reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.7 Sonnet (Non-reasoning) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | ||
| Claude 3.5 Haiku | $0.80 | $1.00 | $0.08 | - | $4.00 | 2048 | 5 minutes |
| |
| Claude 3.5 Sonnet (Oct '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
| Claude 3.5 Sonnet (June '24) | $3.00 | $3.75 | $0.30 | - | $15.00 | 1024 | 5 minutes | Only supported on US West (Oregon) region | |
OpenAI models:
| |||||||||
| GPT-5 (high) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 (medium) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (high) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 (low) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (medium) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (high) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 nano (medium) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| GPT-5 (minimal) | $1.25 | - | $0.13 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-5 mini (minimal) | $0.25 | - | $0.03 | - | $2.00 | 1024 | 5-10 minutes | ||
| GPT-5 nano (minimal) | $0.05 | - | $0.01 | - | $0.40 | 1024 | 5-10 minutes | ||
| o4-mini (high) | $1.10 | - | $0.28 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 | $2.00 | - | $0.50 | - | $8.00 | 1024 | 5-10 minutes | ||
| GPT-4.1 mini | $0.40 | - | $0.10 | - | $1.60 | 1024 | 5-10 minutes | ||
| GPT-4.1 nano | $0.10 | - | $0.03 | - | $0.40 | 1024 | 5-10 minutes | ||
| o3-mini (high) | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o3-mini | $1.10 | - | $0.55 | - | $4.40 | 1024 | 5-10 minutes | ||
| o1 | $15.00 | - | $7.50 | - | $60.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Nov '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| o1-preview | $16.50 | - | $8.25 | - | $66.00 | 1024 | 5-10 minutes | ||
| GPT-4o (Aug '24) | $2.50 | - | $1.25 | - | $10.00 | 1024 | 5-10 minutes | ||
| GPT-4o mini | $0.15 | - | $0.07 | - | $0.60 | 1024 | 5-10 minutes | ||
| GPT-4o (May '24) | $5.00 | - | $2.50 | - | $15.00 | 1024 | 5-10 minutes | ||
| |||||||||
| DeepSeek V3.2 Exp (Reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
| DeepSeek V3.2 Exp (Non-reasoning) | $0.28 | - | $0.03 | - | $0.42 | 64 | - | ||
| Kimi K2 Thinking | $1.15 | - | $0.15 | - | $8.00 | - | - | - | |
| Kimi K2 Thinking | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
| Kimi K2 | $0.60 | - | $0.15 | - | $2.50 | - | - | - | |
| KAT-Coder-Pro V1 | $0.30 | - | $0.04 | - | $1.20 | - | - | - | |
Introduction to Prompt Caching
What is Prompt Caching?
Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Getting your approach to caching right can deliver huge cost savings on input tokens and meaningful performance benefits.
When you send a prompt, the system first checks if that exact prompt has been processed before. If found (cache hit), it returns the stored response instead of generating a new one. If not found (cache miss), the prompt is processed normally, and the response is stored for future use.
Key Metrics to Watch
- Input Price: The standard price you pay for input tokens
- Cache Write Price: What you pay to save prompt tokens into the cache; sometimes higher than standard input pricing
- Cache Hit Price: Discounted rate for prompt tokens that hit the cache
- Cache Storage Price: Hourly cost per million cached tokens (currently unique to Google)
- Cache TTL: The time cached tokens remain available, ranging from hours to days
- Cache Minimum Tokens: Minimum matching token count required before a cache hit is served
How Does Prompt Caching Work?
When you send a prompt to a transformer-based language model, the attention layers process each input token into key (K) and value (V) vectors that are stored in the KV cache. By keeping these values in memory, processing on input tokens can be avoided when identical input tokens are sent into the model again.
Until recently, leveraging the speed and cost benefits of caching was only available for dedicated deployments. Now, serverless API providers—including the frontier labs—have begun passing on some of the cost benefits of caching to developers.
Optimal Use Cases
- System instructions: Large system prompts that must be included across many interactions
- Chat history: Conversation context that accompanies each new user turn
- Per-user personalized context: Extensive user memories or profiles that enable deep personalization
Implementation Considerations
- Activation method varies by provider - some require manual setup while others offer automatic caching
- Cache hit discounts range from 50-90% off standard input token pricing - this really is worth the time to get right
- Caching improves performance for very long prompts (50k+ tokens); benchmarks coming soon