menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Robotics News

>

Why Langua...
source image

Unite

2w

read

348

img
dot

Image Credit: Unite

Why Language Models Get ‘Lost’ in Conversation

  • A new paper from Microsoft Research and Salesforce reveals that Large Language Models (LLMs) struggle significantly when instructions are given in stages rather than all at once, with performance dropping by an average of 39 percent across six tasks.
  • Responses from prestigious models like ChatGPT-4.1 and Gemini 2.5 Pro fluctuate between near-perfect answers and failures based on how tasks are phrased, with output consistency dropping by over half.
  • The paper introduces a method called sharding, breaking down prompts into smaller fragments to release them gradually into a conversation, reminiscent of a restaurant order or collaborative problem-solving.
  • LLMs tend to generate lengthy responses even when incorrect or irrelevant insights are provided, leading to a loss of coherence in conversation.
  • Starting a new conversation with an LLM rather than persisting in an unproductive one may yield better results due to the model getting 'lost' in ongoing exchanges.
  • The study questions the need for separate interpretative layers between users and LLMs and emphasizes the necessity for LLMs to natively support multi-turn interaction.
  • Experiments revealed that LLMs face challenges in multi-turn interactions, with models experiencing performance degradation and increased unreliability when instructions are fragmented.
  • Models varying in size and performance showed similar levels of degradation in multi-turn settings, indicating that strong single-turn performance does not guarantee reliability across multiple turns.
  • The study suggests that unreliability is a fundamental issue in how current models process evolving input and underscores the importance of considering multi-turn ability as a core competency in LLMs.
  • Real-world readiness evaluations should extend beyond fully-specified benchmarks to assess how models handle fragmented input, highlighting concerns for agent frameworks reliant on sustained reasoning across turns.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app