Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) for large language models (LLMs) faces issues with traditional gating mechanisms hindering scalability.
A new gating method called RadarGate is proposed, using rotational operations of LoRA representations to enhance expressiveness and enable richer feature interactions among multiple LoRAs for scalable LLMs.
RadarGate involves fusion of LoRA representations, followed by feeding the output to a rotation matrix with learnable parameters defining relative angular relationships between representations, providing extra freedom for learning cross-LoRA synergies.
Experiments on multiple benchmarks show the effectiveness of RadarGate in scaling LoRAs, with insights suggesting contrastive rotations to align semantically similar representations and separate distant ones.