menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

SortBench:...
source image

Arxiv

5d

read

155

img
dot

Image Credit: Arxiv

SortBench: Benchmarking LLMs based on their ability to sort lists

  • Sorting is a challenging task for Large Language Models (LLMs) due to weaknesses in faithfully representing input data, logical comparisons, and differentiating between syntax and semantics.
  • A new benchmark called SortBench for LLMs has been introduced, offering various difficulty levels and easy scalability.
  • Tests conducted on seven state-of-the-art LLMs, including test-time reasoning models, revealed that even highly capable models like o3-mini can struggle with sorting tasks that involve mixing syntax and semantics.
  • The models also face difficulties in preserving the faithfulness to input for long lists, often dropping or adding items. Test-time reasoning tends to overthink problems, leading to performance degradation.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app