Autoregressive (AR) models have achieved state-of-the-art performance in text and image generation.
Existing methods to speed up AR generation by generating multiple tokens at once are limited in capturing the output distribution due to token dependencies.
Distilled Decoding (DD) uses flow matching to create a deterministic mapping from Gaussian distribution to the output distribution, enabling few-step generation.
DD achieves promising results on ImageNet-256, enabling one-step generation with a speed-up of 6.3x for VAR and 217.8x for LlamaGen.