What is KV Cache?

AI Infrastructure

Last updated: August 1, 2026

A memory cache storing computed key-value pairs from previous tokens to avoid redundant computation during autoregressive generation.

The KV cache stores attention states for already-processed tokens so the model only computes attention for each new token. It is the primary consumer of GPU memory during inference and a key optimization target for serving systems.

What is KV Cache?

Related Terms

Further Reading

Ready to assemble your AI squad?