<ul data-eligibleForWebStory="true"><li>Reinforcement learning from human feedback (RLHF) is crucial for aligning model behavior with user goals.</li><li>Current RLHF methods oversimplify human decision-making by focusing on isolated tasks like classification or regression.</li><li>A new reinforcement learning method presented in an arXiv paper considers multiple tasks to mimic human decision-making.</li><li>The proposed method leverages human ratings in reward-free settings to learn a reward function, striking a balance between classification and regression models.</li><li>This approach accounts for the uncertainty in human decision-making and allows for adaptive strategy emphasis.</li><li>Experiments with synthetic human ratings demonstrate the superior performance of the new method over existing rating-based RL techniques.</li><li>The novel method even outperforms traditional RL approaches in certain scenarios.</li></ul>

Multi-Task Reward Learning from Human Ratings

Discover more