menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

DiaTool-DP...
source image

Arxiv

20h

read

51

img
dot

Image Credit: Arxiv

DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

  • DiaTool-DPO is a novel method that enhances Tool-Augmented Large Language Models' (TA-LLMs) dialogue capabilities through Direct Preference Optimization.
  • DiaTool-DPO models TA-LLM interactions as a Markov Decision Process and categorizes user queries into 3 types based on their state transition trajectories.
  • By introducing a specialized objective loss for dialogue control, DiaTool-DPO achieves substantial improvements over baseline in information gathering (94.8% vs. 44%) and tool call rejection (91% vs. 9.6%) while maintaining core functionality.
  • DiaTool-DPO enables the development of TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app