Scaling Mamba beyond small models could lead to new challenges.
The selection mechanism in state space models (SSMs) overcomes weaknesses on discrete modalities such as text and DNA but can impede performance on data that linear time-invariant SSMs excel on.
The empirical evaluation of Mamba is limited to small model sizes, and it remains to be seen how well it compares at larger sizes.
Scaling SSMs may involve further engineering challenges and adjustments to the model.