The paper introduces a novel dataset, REGEN, designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs).
REGEN extends the Amazon Product Reviews dataset by including user critiques and narratives associated with recommended items.
An end-to-end modeling benchmark is established for conversational recommendation using the LUMEN framework that incorporates LLMs for critiquing, retrieval, and generation.
Results show that incorporating critiques in recommendations enhances quality and LLMs trained on the dataset effectively generate recommendations and contextual narratives comparable to state-of-the-art models.