menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Zero-Shot ...
source image

Arxiv

2d

read

114

img
dot

Image Credit: Arxiv

Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping

  • Reinforcement learning often faces challenges with reward misalignment, where agents optimize for given rewards but fail to exhibit the desired behaviors.
  • To address these issues, the proposed approach utilizes zero-shot, off-the-shelf large language models (LLMs) for reward shaping in continuous control tasks.
  • The LLM-HFBF framework is introduced to identify and correct biases in human feedback while incorporating it into the reward shaping process.
  • Empirical experiments demonstrate that the proposed approach reduces reliance on potentially biased human guidance and maintains high reinforcement learning performance.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app