<ul data-eligibleForWebStory="false"><li>A new perspective on Mixture-of-Experts (MoE) models with top-k routing has been introduced, called Mixture of Group Experts (MoGE), to address limitations of vanilla MoE models.</li><li>MoGE utilizes group sparse regularization for routing inputs, creating a 2D topographic map that enhances expert diversity and specialization, leading to improved performance in tasks like image classification and language modeling.</li><li>Comprehensive evaluations show that MoGE outperforms traditional MoE models with minimal extra memory and computation requirements, offering an efficient solution to scale the number of experts while avoiding redundancy.</li><li>The source code for MoGE is included in the supplementary material and will be made publicly available for further exploration and implementation.</li></ul>

Mixture of Group Experts for Learning Invariant Representations

Discover more