What is Prompt Caching?

AI Engineering

Last updated: July 30, 2026

Reusing previously computed attention states for repeated prompt prefixes to reduce latency and cost.

Prompt caching stores the KV cache for common prompt prefixes (like system prompts) so subsequent requests skip recomputing them. Anthropic and other providers offer built-in prompt caching that can cut costs by up to 90% for repetitive prefixes.

Related Terms

KV Cache Inference Latency Optimization

What is Prompt Caching?

Related Terms

Further Reading

Ready to assemble your AI squad?