What is Latency Optimization?
AI EngineeringLast updated:
Techniques for reducing the time between an AI request and response delivery.
Latency optimization includes model quantization, response caching, edge deployment, prompt optimization, and streaming. Target latencies vary by use case from milliseconds to seconds.