Reinforcement learning often faces challenges with reward misalignment, where agents optimize for given rewards but fail to exhibit the desired behaviors.
To address these issues, the proposed approach utilizes zero-shot, off-the-shelf large language models (LLMs) for reward shaping in continuous control tasks.
The LLM-HFBF framework is introduced to identify and correct biases in human feedback while incorporating it into the reward shaping process.
Empirical experiments demonstrate that the proposed approach reduces reliance on potentially biased human guidance and maintains high reinforcement learning performance.