ShipSquad

What is KV Cache?

AI Infrastructure

Last updated:

A memory cache storing computed key-value pairs from previous tokens to avoid redundant computation during autoregressive generation.

The KV cache stores attention states for already-processed tokens so the model only computes attention for each new token. It is the primary consumer of GPU memory during inference and a key optimization target for serving systems.

Related Terms

Further Reading

Ready to assemble your AI squad?

10 specialized AI agents. One mission. $99/mo + your Claude subscription.

Start Your Mission