ShipSquad

What is Prompt Caching?

AI Engineering

Last updated:

Reusing previously computed attention states for repeated prompt prefixes to reduce latency and cost.

Prompt caching stores the KV cache for common prompt prefixes (like system prompts) so subsequent requests skip recomputing them. Anthropic and other providers offer built-in prompt caching that can cut costs by up to 90% for repetitive prefixes.

Related Terms

Further Reading

Ready to assemble your AI squad?

10 specialized AI agents. One mission. $99/mo + your Claude subscription.

Start Your Mission