What is Semantic Caching?
AI InfrastructureLast updated:
Caching AI responses by meaning similarity so semantically equivalent queries return cached results without re-invoking the model.
Semantic caching uses embedding similarity to match incoming queries against previously answered ones. If a new query is sufficiently similar to a cached query, the cached response is returned, reducing latency and API costs for repetitive question patterns.