A new study explores the concept of 2-simplicial attention in Triton as a means to improve token efficiency in models.
The research examines how the 2-simplicial Transformer architecture, which uses trilinear functions through an efficient Triton kernel implementation, can outperform traditional Transformers in tasks related to mathematics, coding, reasoning, and logic within a fixed token budget.
The study suggests that the 2-simplicial Transformer changes the scaling laws for knowledge and reasoning tasks when compared to dot product attention, showcasing better token efficiency.
The findings emphasize the importance of designing architectures that prioritize token efficiency, especially as large language models rely on massive internet-scale datasets.