Mixture-of-Experts (MoE) Large Language Models (LLMs) suffer from sub-optimal expert pathways resulting in lower accuracy.
A novel class of test-time optimization methods, called C3PO, is developed to re-weight or 're-mix' the experts in different layers for each test sample.
C3PO applies optimization only to the core experts' mixing weights in critical layers, resulting in improved accuracy while saving computation.
C3PO consistently improves the accuracy of MoE LLMs by 7-15% and outperforms other test-time learning methods.