Transformer models face challenges in integrating positional information while allowing multi-head attention flexibility.
ComplexFormer introduces Complex Multi-Head Attention-CMHA to model semantic and positional differences in the complex plane, enhancing representational capacity.
Key improvements in ComplexFormer include per-head Euler transformation and adaptive differential rotation mechanism for head-specific complex subspace operation.
Extensive experiments show that ComplexFormer outperforms strong baselines like RoPE-Transformers in various tasks, demonstrating superior performance, lower generation perplexity, and improved long-context coherence.