menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Enhanced W...
source image

Arxiv

2d

read

269

img
dot

Image Credit: Arxiv

Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models

  • Whole Page Optimization (WPO) is crucial for improving user experience by optimizing search and recommendation results.
  • Pre-trained Large Language Models (LLMs) are effective in generating relevant content, but fine-tuning them for complex tasks like WPO is challenging.
  • This study introduces PageLLM, a reward-based fine-tuning approach for LLMs using user feedback as supervision.
  • PageLLM utilizes a mixed-grained reward mechanism integrating page-level and item-level rewards to optimize presentation.
  • Page-level reward assesses quality and coherence, while item-level reward focuses on accuracy and relevance of recommendations.
  • User feedback is noisy and less precise compared to manually labeled datasets, posing a challenge that PageLLM addresses.
  • PageLLM was tested on public and industrial datasets, surpassing baselines and showing a 0.44% GMV increase in an online A/B test.
  • The dual-reward structure of PageLLM enhances both the overall quality and the individual components of WPO.
  • Fine-tuning LLMs for WPO using user feedback reduces reliance on costly human-annotated data.
  • PageLLM's success in real-world applications highlights its effectiveness in improving user engagement and system performance.

Read Full Article

like

16 Likes

For uninterrupted reading, download the app