JoFormer is a journey-based Transformer architecture that incorporates positional information through learnable directional transforms.
It represents relative positions using sequentially composed directional transforms, outperforming the RoFormer baseline on the Tiny Shakespeare character-level language modeling task.
JoFormer achieves lower perplexity and faster convergence, showcasing the benefits of its more expressive treatment of positional relationships.
The per-token JoFormer, despite being a conceptual variant, demonstrates strong performance, hinting at its potential for more complex architectures.