Reinforcement Learning from Human Feedback (RLHF) has achieved success in fine-tuning large language models.Existing RLHF frameworks assume homogeneous human preferences, limiting adaptability in personalized scenarios.Low-Rank Adaptation (LoRA) is introduced to enable efficient learning of personalized reward models.LoRA captures shared and individual-specific structures, addressing personalization requirements and data constraints.