What is the best LLM API for prompt caching?

Based on our verified pricing data, the cheapest model that qualifies is DeepSeek V4 Flash by DeepSeek at $0.140/Mtok input. See the full ranking above for more options.

How often are prices updated?

Prices are verified against official provider pricing pages twice daily (09:20 & 21:20 UTC) by our automated scraper pipeline.

Pricing / Best For / Best LLM APIs with Prompt Caching

Best LLM APIs with Prompt Caching

LLM APIs that support prompt caching, ranked by cached-input price. Cache hits cost 50–99% less than fresh input — the biggest lever for cutting cost on repeated system prompts, RAG context, and long conversations.

45 models qualify Showing top 15 Sorted by blended cost

DeepSeek V4 Flash

DeepSeek

$0.140 in $0.280 out

$0.210/Mtok blended

1M ctx

DeepSeek V4 Pro

DeepSeek

$0.435 in $0.870 out

$0.652/Mtok blended

1M ctx

GLM-4.7-FlashX

Z.AI

$0.070 in $0.400 out

$0.235/Mtok blended

128K ctx

Cost calculator for this use case

Tokens per day

Input/output ratio: 70/30

Days per month

🥇 DeepSeek V4 Flash $—

🥈 DeepSeek V4 Pro $—

🥉 GLM-4.7-FlashX $—

Full ranking — top 15 models

#	Model	Provider	Input $/Mtok	Output $/Mtok	Blended	Context
1	DeepSeek V4 Flash	DeepSeek	$0.140	$0.280	$0.210	1M	→
2	DeepSeek V4 Pro	DeepSeek	$0.435	$0.870	$0.652	1M	→
3	GLM-4.7-FlashX	Z.AI	$0.070	$0.400	$0.235	128K	→
4	GPT OSS 120B	Fireworks	$0.150	$0.600	$0.375	128K	→
5	DeepSeek V4 Flash	Fireworks	$0.140	$0.280	$0.210	1M	→
6	MiniMax 2.5	Fireworks	$0.300	$1.20	$0.750	128K	→
7	MiniMax-M2	MiniMax	$0.300	$1.20	$0.750	205K	→
8	MiniMax-M2.1	MiniMax	$0.300	$1.20	$0.750	205K	→
9	MiniMax-M2.5	MiniMax	$0.300	$1.20	$0.750	205K	→
10	GLM-4.5-Air	Z.AI	$0.200	$1.10	$0.650	128K	→
11	GPT OSS 20B	Fireworks	$0.070	$0.300	$0.185	128K	→
12	GLM-4.6V	Z.AI	$0.300	$0.900	$0.600	128K	→
13	MiniMax 2.7	Fireworks	$0.300	$1.20	$0.750	128K	→
14	MiniMax M3	Fireworks	$0.300	$1.20	$0.750	1M	→
15	MiniMax-M2.7	MiniMax	$0.300	$1.20	$0.750	205K	→

How models are selected

Models offering prompt caching, sorted by cached-input price per million tokens (cheapest cache reads first).

Prices are per million tokens (Mtok), sourced directly from official provider pricing pages and verified by our automated scraper pipeline that runs twice daily. "Blended cost" is the average of input and output pricing — a quick proxy for typical 50/50 usage patterns.