Farseer: A Refined Scaling Law in Large Language Models

A naukri.com initiative

New

Farseer: A...

Arxiv

199

Image Credit: Arxiv

Farseer introduces a refined scaling law for Large Language Models (LLMs) to improve predictive accuracy across scales.
The traditional scaling gap between small-scale experiments and resource-intensive production systems is addressed by Farseer.
Farseer's model loss surface, $L(N,D)$, offers better empirical data fit compared to prior laws like Chinchilla's law.
The new scaling law reduces extrapolation error by 433% compared to existing models.
Farseer allows reliable evaluation of different training strategies across various scales, facilitating predictions of large-scale performance.
The methodology of Farseer enables confident extrapolation from small-scale ablation studies to predict performance at larger scales.
Insights into optimal compute allocation for LLM training are provided by Farseer, reflecting modern training demands.
Around 1,000 LLMs were trained across different scales and configurations to validate Farseer, utilizing approximately 3 million NVIDIA H100 GPU hours.
All models, data, results, and logs are open-sourced on GitHub to encourage further research and collaboration.

Read Full Article

12 Likes

For uninterrupted reading, download the app