What is Inference?
AI EngineeringThe process of running a trained AI model to generate predictions or outputs from new inputs.
Inference is when AI models process user queries and generate responses. Inference speed, cost, and quality are key considerations for production AI systems. Batching and caching optimize inference efficiency.