menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Implicit L...
source image

Arxiv

1d

read

98

img
dot

Image Credit: Arxiv

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

  • State-space models (SSMs) and transformers are widely used in language modeling but have lower computational complexity than recurrent neural networks (RNNs), limiting their expressivity.
  • RNNs lack parallelization during training, leading to a trade-off between parallelization and expressivity.
  • A new approach proposes implicit SSMs that iterate a transformation until convergence to a fixed point, implementing non-linear state-transitions of RNNs.
  • Approximate fixed-point convergence is found to be sufficient, allowing a scalable training curriculum with partial parallelization.
  • The implicit SSMs exhibit superior state-tracking capabilities on regular languages compared to transformers and SSMs.
  • Implicit SSMs are scaled to natural language reasoning tasks and pretraining large-scale language models with up to 1.3B parameters on 207B tokens, the largest implicit model trained to date.
  • The implicit models outperform explicit counterparts on standard benchmarks.
  • Code for the implicit language models is available on GitHub.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app