menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

MesaNet: S...
source image

Arxiv

3d

read

269

img
dot

Image Credit: Arxiv

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

  • Sequence modeling in neural networks is dominated by transformers with softmax self-attention, but they require scaling memory and compute during inference.
  • Recent work has introduced models like DeltaNet, Mamba, and xLSTM, which have constant memory and compute costs due to linearized softmax operation.
  • A new Mesa layer has been introduced for language modeling at a billion-parameter scale, using an online learning rule for recurrent layer dynamics derivation.
  • Optimal test-time training with the Mesa layer achieves lower language modeling perplexity and higher benchmark performance, although it increases flops spent during inference.

Read Full Article

like

16 Likes

For uninterrupted reading, download the app