The paper titled “RoFormer: Enhanced Transformer with Rotary Position Embedding” introduces a novel approach to positional encoding in transformer architectures through a method called Rotary Position Embedding (RoPE).
The authors propose that the inner product of query qₘ and key kₙ be formulated by a function g, which takes only the word embeddings xₘ, xₙ, and their relative position m − n as input variables.
They express this goal as: f(θᵤ − θₖ) ≈ .
The complete math proof to arrive at this result could be done in another article.
Rotations can be combined by adding their angles, following this rule: R(θᵢ)R(θⱼ) = R(θᵢ + θⱼ)
This is where the relative position emerges! The matrix now represents a rotation by the difference in positions θⱼ- θᵢ, which directly encodes the relative position between tokens.
RoFormer paper and the RoPE method proposed in it represent an advancement in transformer architecture by effectively leveraging positional information through rotary embeddings.
This not only improves model performance but also addresses key limitations associated with traditional positional encodings, particularly in leveraging relative positions, handling long sequences and maintaining computational efficiency.