This paper proposes a unified theoretical framework based on the Kolmogorov-Arnold representation theorem and kernel methods.
The framework establishes a kernel-based feature fitting approach that unifies Kolmogorov-Arnold Networks (KANs) and self-attention mechanisms.
A low-rank Pseudo-Multi-Head Self-Attention module (Pseudo-MHSA) is introduced, which reduces parameter count by nearly 50% compared to traditional MHSA.
Experiments on the CIFAR-10 dataset demonstrate the performance and similarity of the proposed model to the ViT model under the MAE framework.