menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Farseer: A...
source image

Arxiv

2d

read

199

img
dot

Image Credit: Arxiv

Farseer: A Refined Scaling Law in Large Language Models

  • Farseer introduces a refined scaling law for Large Language Models (LLMs) to improve predictive accuracy across scales.
  • The traditional scaling gap between small-scale experiments and resource-intensive production systems is addressed by Farseer.
  • Farseer's model loss surface, $L(N,D)$, offers better empirical data fit compared to prior laws like Chinchilla's law.
  • The new scaling law reduces extrapolation error by 433% compared to existing models.
  • Farseer allows reliable evaluation of different training strategies across various scales, facilitating predictions of large-scale performance.
  • The methodology of Farseer enables confident extrapolation from small-scale ablation studies to predict performance at larger scales.
  • Insights into optimal compute allocation for LLM training are provided by Farseer, reflecting modern training demands.
  • Around 1,000 LLMs were trained across different scales and configurations to validate Farseer, utilizing approximately 3 million NVIDIA H100 GPU hours.
  • All models, data, results, and logs are open-sourced on GitHub to encourage further research and collaboration.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app