Researchers propose a method to enhance State-space models (SSMs) by sparsifying them within given computational budgets.
The method involves a hierarchical sparsification technique called Simba, which prunes tokens in upper layers more than in lower layers to mimic highway behavior.
By implementing Simba, researchers show improved performance in natural language tasks compared to the baseline model, Mamba, with the same FLOPS.
The study demonstrates that Simba not only increases efficiency but also enhances information flow across long sequences.