<ul data-eligibleForWebStory="true"><li>Researchers propose autoregressive adversarial post-training (AAPT) to enable real-time interactive video generation.</li><li>Existing large-scale video generation models are computationally intensive, hindering real-time and interactive usage.</li><li>AAPT transforms a pre-trained latent video diffusion model into a real-time, interactive video generator.</li><li>The model generates a latent frame at a time using a single neural function evaluation, enabling real-time streaming and interactive control.</li><li>This approach leverages adversarial training for autoregressive generation, enhancing efficiency and error reduction.</li><li>The 8B model from the study achieved 24fps, real-time video generation at 736x416 resolution on a single H100 GPU.</li><li>On 8xH100 GPUs, the model could generate 1280x720 resolution videos up to a minute long (1440 frames) in real-time.</li><li>AAPT's design utilizes the KV cache efficiently and employs student-forcing during training to reduce error accumulation over long video sequences.</li></ul>

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Discover more