menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

TRA: Bette...
source image

Arxiv

1d

read

38

img
dot

Image Credit: Arxiv

TRA: Better Length Generalisation with Threshold Relative Attention

  • Transformers struggle with length generalisation, displaying poor performance even on basic tasks.
  • Two key failures of the self-attention mechanism in Transformers are identified: inability to fully remove irrelevant information and unintentional up-weighting of irrelevant information due to learned positional biases.
  • Selective sparsity and contextualised relative distance are proposed as two mitigations to improve the generalisation capabilities of decoder only transformers.
  • Refactoring the attention mechanism with these two mitigations in place can substantially enhance the performance of transformers in handling length generalisation.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app