PDE-Transformer is an improved transformer-based architecture for surrogate modeling of physics simulations on regular grids.
It combines recent architectural improvements of diffusion transformers with adjustments specific for large-scale simulations to yield a more scalable and versatile general-purpose transformer architecture.
The proposed architecture outperforms state-of-the-art transformer architectures for computer vision on a large dataset of 16 different types of PDEs.
By embedding different physical channels as spatio-temporal tokens with channel-wise self-attention, the architecture achieves improved performance on several challenging downstream tasks compared to training from scratch.