Visual presentations are essential for effective communication, but automated generation using deep learning has faced challenges like disorganized layouts and inaccurate text summarization.
To overcome these limitations, a new agentic and modular framework called PreGenie has been introduced.
PreGenie leverages multimodal large language models (MLLMs) to create high-quality visual presentations in two stages: Analysis and Initial Generation, and Review and Re-generation.
Experiments show that PreGenie excels in aesthetics, content consistency, and alignment with human design preferences compared to existing models.