This paper introduces a knowledge distillation framework to integrate structural knowledge from Graph Neural Networks (GNNs) into Transformer models.
GNNs excel in capturing localized topological patterns, while Transformers are better at modeling long-range dependencies and global contextual information.
The proposed framework enables the transfer of multiscale structural knowledge, bridging the gap between GNNs and Transformers.
The approach establishes a new way of inheriting graph structural biases in Transformer architectures with wide-ranging applications.