Best LLM APIs with Prompt Caching
LLM APIs that support prompt caching, ranked by cached-input price. Cache hits cost 50–99% less than fresh input — the biggest lever for cutting cost on repeated system prompts, RAG context, and long conversations.
Cost calculator for this use case
🥇 DeepSeek V4 Flash
$—
🥈 DeepSeek V4 Pro
$—
🥉 GLM-4.7-FlashX
$—
Full ranking — top 15 models
| # | Model | Provider | Input $/Mtok | Output $/Mtok | Blended | Context | |
|---|---|---|---|---|---|---|---|
| 1 | DeepSeek V4 Flash | DeepSeek | $0.140 | $0.280 | $0.210 | 1M | → |
| 2 | DeepSeek V4 Pro | DeepSeek | $0.435 | $0.870 | $0.652 | 1M | → |
| 3 | GLM-4.7-FlashX | Z.AI | $0.070 | $0.400 | $0.235 | 128K | → |
| 4 | GPT OSS 120B | Fireworks | $0.150 | $0.600 | $0.375 | 128K | → |
| 5 | DeepSeek V4 Flash | Fireworks | $0.140 | $0.280 | $0.210 | 1M | → |
| 6 | MiniMax 2.5 | Fireworks | $0.300 | $1.20 | $0.750 | 128K | → |
| 7 | MiniMax-M2 | MiniMax | $0.300 | $1.20 | $0.750 | 205K | → |
| 8 | MiniMax-M2.1 | MiniMax | $0.300 | $1.20 | $0.750 | 205K | → |
| 9 | MiniMax-M2.5 | MiniMax | $0.300 | $1.20 | $0.750 | 205K | → |
| 10 | GLM-4.5-Air | Z.AI | $0.200 | $1.10 | $0.650 | 128K | → |
| 11 | GPT OSS 20B | Fireworks | $0.070 | $0.300 | $0.185 | 128K | → |
| 12 | GLM-4.6V | Z.AI | $0.300 | $0.900 | $0.600 | 128K | → |
| 13 | MiniMax 2.7 | Fireworks | $0.300 | $1.20 | $0.750 | 128K | → |
| 14 | MiniMax M3 | Fireworks | $0.300 | $1.20 | $0.750 | 1M | → |
| 15 | MiniMax-M2.7 | MiniMax | $0.300 | $1.20 | $0.750 | 205K | → |
How models are selected
Models offering prompt caching, sorted by cached-input price per million tokens (cheapest cache reads first).
Prices are per million tokens (Mtok), sourced directly from official provider pricing pages and verified by our automated scraper pipeline that runs twice daily. "Blended cost" is the average of input and output pricing — a quick proxy for typical 50/50 usage patterns.