<ul data-eligibleForWebStory="false">Mixture-of-Experts (MoE) models enhance the scalability of large language models by activating relevant experts per input.The high number of expert networks in an MoE model poses storage challenges for edge devices.A study addresses expert caching on edge servers under storage constraints for efficient distributed inference using a Top-K selection strategy.Proposed algorithms aim to minimize latency for expert co-activation within MoE layers, showing improved inference speed in simulations.