Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models

A naukri.com initiative

New

Enhanced W...

Arxiv

269

Image Credit: Arxiv

Whole Page Optimization (WPO) is crucial for improving user experience by optimizing search and recommendation results.
Pre-trained Large Language Models (LLMs) are effective in generating relevant content, but fine-tuning them for complex tasks like WPO is challenging.
This study introduces PageLLM, a reward-based fine-tuning approach for LLMs using user feedback as supervision.
PageLLM utilizes a mixed-grained reward mechanism integrating page-level and item-level rewards to optimize presentation.
Page-level reward assesses quality and coherence, while item-level reward focuses on accuracy and relevance of recommendations.
User feedback is noisy and less precise compared to manually labeled datasets, posing a challenge that PageLLM addresses.
PageLLM was tested on public and industrial datasets, surpassing baselines and showing a 0.44% GMV increase in an online A/B test.
The dual-reward structure of PageLLM enhances both the overall quality and the individual components of WPO.
Fine-tuning LLMs for WPO using user feedback reduces reliance on costly human-annotated data.
PageLLM's success in real-world applications highlights its effectiveness in improving user engagement and system performance.

Read Full Article

16 Likes

For uninterrupted reading, download the app