<ul data-eligibleForWebStory="true">Researchers propose autoregressive adversarial post-training (AAPT) to enable real-time interactive video generation.Existing large-scale video generation models are computationally intensive, hindering real-time and interactive usage.AAPT transforms a pre-trained latent video diffusion model into a real-time, interactive video generator.The model generates a latent frame at a time using a single neural function evaluation, enabling real-time streaming and interactive control.This approach leverages adversarial training for autoregressive generation, enhancing efficiency and error reduction.The 8B model from the study achieved 24fps, real-time video generation at 736x416 resolution on a single H100 GPU.On 8xH100 GPUs, the model could generate 1280x720 resolution videos up to a minute long (1440 frames) in real-time.AAPT's design utilizes the KV cache efficiently and employs student-forcing during training to reduce error accumulation over long video sequences.