<ul><li>Speculative decoding is an effective method for lossless acceleration of large language models during inference.</li><li>Block Verification is a simple draft verification algorithm that verifies the entire block jointly, providing additional speedup during inference.</li><li>Block verification improves the wall-clock speed by 5%-8% in various tasks and datasets.</li><li>It maintains the strong lossless guarantee and can be used as a default approach in speculative decoding implementations.</li></ul>

Block Verification Accelerates Speculative Decoding

Discover more