What is Continuous Batching?
AI InfrastructureLast updated:
A serving optimization that dynamically groups incoming requests into batches as they arrive rather than waiting for fixed batch sizes.
Continuous batching maximizes GPU utilization by inserting new requests into running batches as existing requests complete. This eliminates idle GPU time between batches and significantly improves both throughput and latency for concurrent users.