<ul data-eligibleForWebStory="true"><li>State-space models (SSMs) and transformers are widely used in language modeling but have lower computational complexity than recurrent neural networks (RNNs), limiting their expressivity.</li><li>RNNs lack parallelization during training, leading to a trade-off between parallelization and expressivity.</li><li>A new approach proposes implicit SSMs that iterate a transformation until convergence to a fixed point, implementing non-linear state-transitions of RNNs.</li><li>Approximate fixed-point convergence is found to be sufficient, allowing a scalable training curriculum with partial parallelization.</li><li>The implicit SSMs exhibit superior state-tracking capabilities on regular languages compared to transformers and SSMs.</li><li>Implicit SSMs are scaled to natural language reasoning tasks and pretraining large-scale language models with up to 1.3B parameters on 207B tokens, the largest implicit model trained to date.</li><li>The implicit models outperform explicit counterparts on standard benchmarks.</li><li>Code for the implicit language models is available on GitHub.</li></ul>

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

Discover more