<ul><li>JoFormer is a journey-based Transformer architecture that incorporates positional information through learnable directional transforms.</li><li>It represents relative positions using sequentially composed directional transforms, outperforming the RoFormer baseline on the Tiny Shakespeare character-level language modeling task.</li><li>JoFormer achieves lower perplexity and faster convergence, showcasing the benefits of its more expressive treatment of positional relationships.</li><li>The per-token JoFormer, despite being a conceptual variant, demonstrates strong performance, hinting at its potential for more complex architectures.</li></ul>

JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset

Discover more