In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning.
Proposed T2MIR (Token- and Task-wise MoE for In-context RL), an innovative framework that incorporates mixture-of-experts (MoE) into transformer-based decision models.
T2MIR addresses challenges in state-action-reward data multi-modality and diverse nature of decision tasks by utilizing token-wise and task-wise MoE layers for improved learning capacity.
Experiments demonstrate that T2MIR enhances in-context learning and outperforms various baselines, showcasing its potential in advancing ICRL towards achievements in language and vision domains.