What is Speculative Decoding?
AI InfrastructureLast updated:
An inference optimization where a small draft model proposes tokens that a larger model verifies in parallel.
Speculative decoding uses a fast, lightweight model to generate candidate token sequences, then the full model verifies them in a single forward pass. Accepted tokens skip individual generation steps, yielding 2-3x speedups with identical output quality.