What is Prompt Caching?
AI EngineeringLast updated:
Reusing previously computed attention states for repeated prompt prefixes to reduce latency and cost.
Prompt caching stores the KV cache for common prompt prefixes (like system prompts) so subsequent requests skip recomputing them. Anthropic and other providers offer built-in prompt caching that can cut costs by up to 90% for repetitive prefixes.