FedRLHF is a decentralized framework for Reinforcement Learning with Human Feedback (RLHF).
It addresses privacy concerns by enabling collaborative policy learning without sharing raw data or human feedback.
The framework utilizes federated reinforcement learning, allowing each client to integrate human feedback locally into their reward functions.
Empirical evaluations demonstrate that FedRLHF preserves user privacy, achieves performance similar to centralized RLHF, and enhances personalization across different client environments.