Transformers are widely used for tasks like image classification and physics simulations, but their quadratic complexity makes them impractical for high-resolution inputs.
A new approach called Multipole Attention Neural Operator (MANO) is introduced to address this issue by computing attention in a distance-based multiscale fashion.
MANO maintains a global receptive field in each attention head, achieving linear time and memory complexity with respect to the number of grid points.
Empirical results show that MANO competes with state-of-the-art models like ViT and Swin Transformer, reducing runtime and peak memory usage significantly.