<ul><li>Recent advances in language modeling and vision have been made by training large models on diverse, multi-task data.</li><li>Value-based reinforcement learning has typically been driven by small models trained in single-task contexts due to challenges like sparse rewards and gradient conflicts.</li><li>This work introduces high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings, showing improved multi-task training in online RL settings.</li><li>The approach outlined in this study leads to state-of-the-art single and multi-task performance across various benchmarks and enables sample-efficient transfer to new tasks.</li></ul>

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Discover more