Researchers introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM) for dynamic 3D scene reconstruction from monocular posed videos.
DGS-LRM is a feed-forward method capable of predicting deformable 3D Gaussian splats for any dynamic scene, addressing the limitations of existing models for static scenes.
Challenges in developing a feed-forward model for dynamic scene reconstruction include a lack of training data and requirements for suitable 3D representations and training paradigms.
Key technical contributions of DGS-LRM include an enhanced synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision.
Additionally, it utilizes a per-pixel deformable 3D Gaussian representation that supports dynamic view synthesis and long-range 3D tracking.
DGS-LRM incorporates a large transformer network for real-time, generalizable dynamic scene reconstruction.
Extensive qualitative and quantitative experiments show that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods and surpasses state-of-the-art predictive dynamic reconstruction on real-world examples.
The model's predicted 3D deformation is accurate, enabling efficient long-range 3D tracking comparable to leading monocular video 3D tracking methods.