<ul><li>TransMamba is a framework that combines Transformer and Mamba models for efficient long-sequence processing.</li><li>TransMamba uses shared parameter matrices to switch between attention and state space model (SSM) mechanisms.</li><li>The framework includes a Memory converter to bridge Transformer and Mamba models for seamless information flow.</li><li>Experimental results demonstrate that TransMamba achieves superior training efficiency and performance compared to baselines.</li></ul>

TransMamba: Flexibly Switching between Transformer and Mamba

Discover more