Recurrent neural networks (RNNs) have shown strong performance and faster inference compared to Transformers.
A novel method is proposed to replace the computationally expensive backpropagation through time (BPTT) algorithm with a fixed gradient feedback mechanism.
The method leverages state-space model (SSM) principles to directly propagate gradients from future time steps, reducing training overhead.
Experiments on language modeling benchmarks demonstrate competitive perplexity scores while significantly reducing training costs.