<ul data-eligibleForWebStory="true">Whole Page Optimization (WPO) is crucial for improving user experience by optimizing search and recommendation results.Pre-trained Large Language Models (LLMs) are effective in generating relevant content, but fine-tuning them for complex tasks like WPO is challenging.This study introduces PageLLM, a reward-based fine-tuning approach for LLMs using user feedback as supervision.PageLLM utilizes a mixed-grained reward mechanism integrating page-level and item-level rewards to optimize presentation.Page-level reward assesses quality and coherence, while item-level reward focuses on accuracy and relevance of recommendations.User feedback is noisy and less precise compared to manually labeled datasets, posing a challenge that PageLLM addresses.PageLLM was tested on public and industrial datasets, surpassing baselines and showing a 0.44% GMV increase in an online A/B test.The dual-reward structure of PageLLM enhances both the overall quality and the individual components of WPO.Fine-tuning LLMs for WPO using user feedback reduces reliance on costly human-annotated data.PageLLM's success in real-world applications highlights its effectiveness in improving user engagement and system performance.