menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Adaptive C...
source image

Arxiv

1w

read

93

img
dot

Image Credit: Arxiv

Adaptive Computation Pruning for the Forgetting Transformer

  • The recently proposed Forgetting Transformer (FoX) incorporates a forget gate into softmax attention and has shown consistently better or on-par performance compared to the standard RoPE-based Transformer.
  • Adaptive Computation Pruning (ACP) is introduced for FoX, a method that dynamically prunes computations involving input-output dependencies strongly decayed by the forget gate.
  • ACP reduces the number of FLOPs in softmax attention by around 70% across different model sizes and context lengths, resulting in a 10-35% improvement in training throughput.
  • The computational savings are greater with longer context lengths, and the performance of FoX is not affected by ACP.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app