What is Latency Optimization?
AI EngineeringTechniques for reducing the time between an AI request and response delivery.
Latency optimization includes model quantization, response caching, edge deployment, prompt optimization, and streaming. Target latencies vary by use case from milliseconds to seconds.