State Space Models (SSMs) are gaining popularity as an alternative to Transformers due to their memory usage and performance benefits.
Quamba2 is a post-training quantization framework for selective SSMs that enables scaling on various platforms.
Quamba2 offers bit-width configurations of W8A8, W4A8, and W4A16, catering to different usage scenarios.
Experimental results show that Quamba2-8B outperforms other SSM quantization methods, offering significant speed-ups and memory reduction with a minimal accuracy drop.