What is Speculative Decoding?

AI Infrastructure

Last updated: August 1, 2026

An inference optimization where a small draft model proposes tokens that a larger model verifies in parallel.

Speculative decoding uses a fast, lightweight model to generate candidate token sequences, then the full model verifies them in a single forward pass. Accepted tokens skip individual generation steps, yielding 2-3x speedups with identical output quality.

What is Speculative Decoding?

Related Terms

Further Reading

Ready to assemble your AI squad?