menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

T-SHIRT: T...
source image

Arxiv

5d

read

114

img
dot

Image Credit: Arxiv

T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning

  • Instruction tuning is crucial for Large Language Models (LLMs) to follow user instructions effectively.
  • Existing data selection methods in instruction tuning have limitations, such as evaluating quality at the sample level and overlooking token-level informativeness and scoring method robustness.
  • T-SHIRT is a new data selection framework introduced to address these limitations, focusing on token-level informativeness and selecting robust and reliable samples for instruction tuning.
  • Models instruction-tuned with T-SHIRT on a curated dataset can outperform those trained on the full dataset by up to 5.48 points on average across eight benchmarks, remaining cost-effective and highly efficient.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app