What is Quantization?
AI EngineeringReducing AI model precision from 32-bit to 8-bit or 4-bit to decrease size and speed up inference.
Quantization enables large models to run on consumer hardware by reducing memory requirements. Methods like GPTQ and GGUF maintain most model quality while dramatically reducing resource needs.