Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis

Prompt Caching: Cost & Performance Analysis Across Providers

Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Compare features and pricing across all major AI providers below.

Caching requires exact prompt matches and varies by provider - some like OpenAI and DeepSeek offer automatic caching, while others including Google, Anthropic, and Amazon require manual setup. Learn more about how it works in our introduction to prompt caching below.

Pricing: Cached Input Prompts

Price: USD per 1M Tokens
Input (standard)
Cache Write
Cache Hit
Cache Storage per Hour
Output (standard)

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

One-time cost charged when storing a prompt in the cache for future reuse, represented as USD per million tokens.

Price per token for cached prompts (previously processed), typically offering a significant discount compared to regular input price, represented as USD per million tokens.

Cost to maintain tokens in cache storage, charged per million tokens per hour. Currently only applicable to Google's Gemini models.

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Cache Hit Discount

Cache Hit Discount: Reduction in input token cost as a result of a cache hit; Higher is better

Reduction in input token cost due to cache hit relative to input price. Formula: (Input Token Price - Cache Hit Price per Token) / Input Token Price. Please note that this discount figure does not take into account all costs associated with cache hits, such as cache write and storage costs.

Prompt Caching API Specifications

Provider / ModelInput (standard)Cache writeCache hitCache storageOutput (standard)Auto-EnabledMin tokensCache TTLNotes
OpenAI
OpenAI
  • Cache read tokens are 50% cheaper than base input tokens
  • Cache persists up to one hour during off-peak periods
GPT-5.1 (high)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5.1 (Non-reasoning)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 Codex (high)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 (high)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 (medium)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (high)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 (low)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (medium)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 nano (high)$0.05-$0.01-$0.40
10245-10 minutes
GPT-5 nano (medium)$0.05-$0.01-$0.40
10245-10 minutes
GPT-5 (minimal)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (minimal)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 nano (minimal)$0.05-$0.01-$0.40
10245-10 minutes
GPT-5 (ChatGPT)$1.25-$0.13-$10.00
10245-10 minutes
o4-mini (high)$1.10-$0.28-$4.40
10245-10 minutes
o3$2.00-$0.50-$8.00
10245-10 minutes
GPT-4.1$2.00-$0.50-$8.00
10245-10 minutes
GPT-4.1 mini$0.40-$0.10-$1.60
10245-10 minutes
GPT-4.1 nano$0.10-$0.03-$0.40
10245-10 minutes
o3-mini (high)$1.10-$0.55-$4.40
10245-10 minutes
o3-mini$1.10-$0.55-$4.40
10245-10 minutes
o1$15.00-$7.50-$60.00
10245-10 minutes
GPT-4o (Nov '24)$2.50-$1.50-$10.00
10245-10 minutes
GPT-4o (Aug '24)$2.50-$1.25-$10.00
10245-10 minutes
GPT-4o mini$0.15-$0.07-$0.60
10245-10 minutes
Anthropic
Anthropic
  • Cache read tokens are 90% cheaper than base input tokens
  • Cache write tokens are 25% more expensive than base input tokens
Claude Opus 4.5 (Reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude Opus 4.5 (Non-reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude 4.5 Haiku (Reasoning)$1.00$1.25$0.10-$5.00
40965 minutes

1h cache write: $2

Claude 4.5 Haiku (Non-reasoning)$1.00$1.25$0.10-$5.00
40965 minutes

1h cache write: $2

Claude 4.5 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.5 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.1 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4.1 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit:$0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30.0

Claude 3.5 Haiku$0.80$1.00$0.08-$4.00
20485 minutes
Claude 3 Haiku$0.25$0.30$0.03-$1.25
20485 minutes
Claude 3 Opus$15.00$18.75$1.50-$75.00
10245 minutes
Google
Google
  • Google supports caching for Gemini models and Anthropic's Claude models.
  • Pricing and usage differs between model families.
Gemini 3 Flash Preview (Reasoning)$0.50--$1.00$3.00
204860 minutes
Gemini 3 Flash Preview (Non-reasoning)$0.50--$1.00$3.00
204860 minutes
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)$0.10-$0.01$1.00$0.40
204860 minutes
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)$0.10-$0.01$1.00$0.40
204860 minutes
Gemini 2.5 Flash-Lite (Reasoning)$0.10-$0.01$1.00$0.40
204860 minutes
Gemini 2.5 Flash-Lite (Non-reasoning)$0.10-$0.01$1.00$0.40
204860 minutes
Gemini 2.5 Pro$1.25-$0.13$4.50$10.00
204860 minutes
Gemini 2.5 Pro$1.25-$0.13$4.50$10.00
204860 minutes
Gemini 2.5 Flash (Reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Flash (Reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Flash (Non-reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Flash (Non-reasoning)$0.30-$0.03$1.00$2.50
204860 minutes
Gemini 2.5 Pro Preview (May' 25)$1.25-$0.13$4.50$10.00
204860 minutes
Gemini 2.0 Flash (Feb '25)$0.15-$0.03$1.00$0.60
204860 minutes
Claude Opus 4.5 (Reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude Opus 4.5 (Non-reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude 4.5 Haiku (Reasoning)$1.00$1.25$0.10-$5.00
40965 minutes

1h cache write: $2

Claude 4.5 Haiku (Non-reasoning)$1.00$1.25$0.10-$5.00
40965 minutes

1h cache write: $2

Claude 4.5 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.5 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.1 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4.1 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit:$0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 4 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes

1h cache write: $30

Claude 3.7 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes
Claude 3.5 Haiku$0.80$1.00$0.08-$4.00
20485 minutes
Claude 3.5 Sonnet (Oct '24)$3.00$3.75$0.30-$15.00
10245 minutes
Claude 3.5 Sonnet (June '24)$3.00$3.75$0.30-$15.00
10245 minutes
xAI
xAI
Grok 4.1 Fast (Reasoning)$0.20-$0.05-$0.50---
Grok 4 Fast (Reasoning)$0.20-$0.05-$0.50
--
Grok 4 Fast (Non-reasoning)$0.20-$0.05-$0.50
--
Grok Code Fast 1$0.20-$0.02-$1.50
--
Grok 4$3.00-$0.75-$15.00
--
Grok 3 mini Reasoning (high)$0.30-$0.07-$0.50
--
Grok 3 mini Reasoning (high)$0.60-$0.07-$4.00
--
Grok 3$3.00-$0.75-$15.00
--
Grok 3$5.00-$0.07-$25.00
--
Amazon Bedrock
Amazon Bedrock
  • Amazon supports caching for Nova models and Anthropic's Claude models.
  • Pricing and usage differs between model families.
Nova Premier$2.50-$0.62-$12.50
10005 minutes
Nova Pro$0.80-$0.20-$3.20
10005 minutes

Only supported on US East (N. Virginia) region

Nova Lite$0.06-$0.01-$0.24
10005 minutes

Only supported on US East (N. Virginia) region

Nova Micro$0.04-$0.01-$0.14
10005 minutes

Only supported on US East (N. Virginia) region

Claude Opus 4.5 (Reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude Opus 4.5 (Non-reasoning)$5.00$6.25$0.50-$25.00
10245 minutes

1h cache write: $10

Claude 4.5 Haiku (Reasoning)$1.00$1.25$0.10-$5.00
40965 minutes
Claude 4.5 Haiku (Non-reasoning)$1.00$1.25$0.10-$5.00
40965 minutes
Claude 4.5 Sonnet (Reasoning)$3.00$4.12$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.5 Sonnet (Non-reasoning)$3.00$4.12$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
    • 1h cache write: $6.00
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
    • 1h cache write: $12.00
Claude 4.1 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes
Claude 4.1 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes
Claude 4 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
  • 200K:

    • Cache hit: $0.60
    • 5m cache write: $7.50
Claude 4 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes

Tiered pricing:

  • ≤200K:

    • Cache hit: $0.30
    • 5m cache write: $3.75
  • 200K:

    • Cache hit:$0.60
    • 5m cache write: $7.50
Claude 4 Opus (Reasoning)$15.00$18.75$1.50-$75.00
10245 minutes
Claude 4 Opus (Non-reasoning)$15.00$18.75$1.50-$75.00
10245 minutes
Claude 3.7 Sonnet (Reasoning)$3.00$3.75$0.30-$15.00
10245 minutes
Claude 3.7 Sonnet (Non-reasoning)$3.00$3.75$0.30-$15.00
10245 minutes
Claude 3.5 Haiku$0.80$1.00$0.08-$4.00
20485 minutes
  • 1h cache write: $1.60
  • Only supported on US West (Oregon) region
Claude 3.5 Sonnet (Oct '24)$3.00$3.75$0.30-$15.00
10245 minutes

Only supported on US West (Oregon) region

Claude 3.5 Sonnet (June '24)$3.00$3.75$0.30-$15.00
10245 minutes

Only supported on US West (Oregon) region

Microsoft Azure
Microsoft Azure

OpenAI models:

  • Cache read tokens are 50% cheaper than base input tokens (Standard deployments)
  • Cache persists up to one hour during off-peak periods
GPT-5 (high)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 (medium)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (high)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 (low)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (medium)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 nano (high)$0.05-$0.01-$0.40
10245-10 minutes
GPT-5 nano (medium)$0.05-$0.01-$0.40
10245-10 minutes
GPT-5 (minimal)$1.25-$0.13-$10.00
10245-10 minutes
GPT-5 mini (minimal)$0.25-$0.03-$2.00
10245-10 minutes
GPT-5 nano (minimal)$0.05-$0.01-$0.40
10245-10 minutes
o4-mini (high)$1.10-$0.28-$4.40
10245-10 minutes
o3$2.00-$0.50-$8.00
10245-10 minutes
GPT-4.1$2.00-$0.50-$8.00
10245-10 minutes
GPT-4.1 mini$0.40-$0.10-$1.60
10245-10 minutes
GPT-4.1 nano$0.10-$0.03-$0.40
10245-10 minutes
o3-mini (high)$1.10-$0.55-$4.40
10245-10 minutes
o3-mini$1.10-$0.55-$4.40
10245-10 minutes
o1$15.00-$7.50-$60.00
10245-10 minutes
GPT-4o (Nov '24)$2.50-$1.25-$10.00
10245-10 minutes
o1-preview$16.50-$8.25-$66.00
10245-10 minutes
GPT-4o (Aug '24)$2.50-$1.25-$10.00
10245-10 minutes
GPT-4o mini$0.15-$0.07-$0.60
10245-10 minutes
GPT-4o (May '24)$5.00-$2.50-$15.00
10245-10 minutes
DeepSeek
DeepSeek
  • Cache read tokens are 50% cheaper on average (up to 90% with cache optimization)
  • Implements Context Caching on Disk technology
  • No guarantee of 100% cache hits
DeepSeek V3.2 Exp (Reasoning)$0.28-$0.03-$0.42
64-
DeepSeek V3.2 Exp (Non-reasoning)$0.28-$0.03-$0.42
64-
Kimi
Kimi
Kimi K2 Thinking$1.15-$0.15-$8.00---
Kimi K2 Thinking$0.60-$0.15-$2.50---
Kimi K2$0.60-$0.15-$2.50---
Novita
Novita
KAT-Coder-Pro V1$0.30-$0.04-$1.20---

Introduction to Prompt Caching

What is Prompt Caching?

Prompt caching is a critical new innovation for language model inference - saving developers up to 90% and making long context inputs suddenly viable. Getting your approach to caching right can deliver huge cost savings on input tokens and meaningful performance benefits.

When you send a prompt, the system first checks if that exact prompt has been processed before. If found (cache hit), it returns the stored response instead of generating a new one. If not found (cache miss), the prompt is processed normally, and the response is stored for future use.

Key Metrics to Watch

  • Input Price: The standard price you pay for input tokens
  • Cache Write Price: What you pay to save prompt tokens into the cache; sometimes higher than standard input pricing
  • Cache Hit Price: Discounted rate for prompt tokens that hit the cache
  • Cache Storage Price: Hourly cost per million cached tokens (currently unique to Google)
  • Cache TTL: The time cached tokens remain available, ranging from hours to days
  • Cache Minimum Tokens: Minimum matching token count required before a cache hit is served

How Does Prompt Caching Work?

When you send a prompt to a transformer-based language model, the attention layers process each input token into key (K) and value (V) vectors that are stored in the KV cache. By keeping these values in memory, processing on input tokens can be avoided when identical input tokens are sent into the model again.

Until recently, leveraging the speed and cost benefits of caching was only available for dedicated deployments. Now, serverless API providers—including the frontier labs—have begun passing on some of the cost benefits of caching to developers.

Optimal Use Cases

  • System instructions: Large system prompts that must be included across many interactions
  • Chat history: Conversation context that accompanies each new user turn
  • Per-user personalized context: Extensive user memories or profiles that enable deep personalization

Implementation Considerations

  • Activation method varies by provider - some require manual setup while others offer automatic caching
  • Cache hit discounts range from 50-90% off standard input token pricing - this really is worth the time to get right
  • Caching improves performance for very long prompts (50k+ tokens); benchmarks coming soon