What is GGUF?
AI InfrastructureLast updated:
A file format for storing quantized LLM weights optimized for efficient CPU and GPU inference on consumer hardware.
GGUF (GPT-Generated Unified Format) is the standard format for llama.cpp and related tools. It supports multiple quantization levels (Q4, Q5, Q8) enabling large models to run on laptops and desktops without enterprise GPU hardware.