menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Test-Time ...
source image

Arxiv

2d

read

259

img
dot

Image Credit: Arxiv

Test-Time Training Done Right

  • Test-Time Training (TTT) models context dependencies by adapting part of the model's weights (referred to as fast weights) during inference, storing temporary memories of past tokens in the current sequence.
  • Existing TTT methods faced challenges in handling long-context data efficiently on modern GPUs due to low FLOPs utilization and small online minibatch sizes, restricting their application beyond 1D ordered sequences.
  • A new approach called Large Chunk Test-Time Training (LaCT) utilizes extremely large chunk updates (from 2K to 1M tokens) across tasks, significantly improving hardware utilization, state capacity, and enabling easy integration of sophisticated optimizers.
  • The LaCT approach has been validated across various modalities and tasks, demonstrating scalability up to a 14B-parameter AR video diffusion model and enabling novel view synthesis with 1 million context length, aiming to advance long-context modeling and test-time training research.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app