<ul data-eligibleForWebStory="true"><li>Large language models have improved conversational AI assistants, but evaluating personalization in these assistants is challenging.</li><li>Existing personalization benchmarks do not capture the complexities of personalized task-oriented assistance.</li><li>To address this gap, PersonaLens is introduced, a benchmark for evaluating personalization in task-oriented AI assistants.</li><li>PersonaLens includes diverse user profiles with rich preferences and interaction histories, along with specialized LLM-based user and judge agents.</li><li>The user agent engages in realistic task-oriented dialogues with AI assistants, while the judge agent assesses personalization, response quality, and task success.</li><li>Extensive experiments with current LLM assistants across diverse tasks have shown significant variability in personalization capabilities.</li><li>PersonaLens provides crucial insights for the advancement of conversational AI systems.</li></ul>

PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants

Discover more