3D molecule generation is vital for drug discovery and material science, requiring models to handle complex multi-modalities.
An important challenge is integrating modalities like atom types, chemical bonds, and 3D coordinates while maintaining SE(3) equivariance for 3D coordinates.
Existing methods often use separate latent spaces for different modalities, affecting training and sampling efficiency.
A Unified Variational Auto-Encoder for 3D Molecular Latent Diffusion Modeling (UAE-3D) is proposed to address this challenge.
UAE-3D compresses 3D molecules into a unified latent space with near-zero reconstruction error, simplifying handling of multi-modalities.
The unified latent space enables efficient latent diffusion modeling without the complexities of multi-modality handling.
The Diffusion Transformer, a molecular-inductive-bias-free diffusion model, is used for latent generation.
Extensive experiments on GEOM-Drugs and QM9 datasets show that UAE-3D sets new benchmarks in de novo and conditional 3D molecule generation.
On GEOM-Drugs, FCD reduction by 72.6% compared to the previous best result is achieved, with over 70% relative average improvements in geometric fidelity.