menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Multiverse...
source image

Arxiv

2d

read

358

img
dot

Image Credit: Arxiv

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

  • Researchers introduce Multiverse, a generative model that enables natively parallel generation by internalizing a MapReduce paradigm.
  • Multiverse operates through three stages: adaptive task decomposition, parallel subtask execution, and lossless result synthesis.
  • A real-world Multiverse reasoning model is created with co-design of data, algorithm, and system, facilitating rapid transfer from AR-LLMs.
  • Multiverse 1K is developed by converting sequential reasoning chains into structured training data using an automated pipeline.
  • Multiverse Attention is designed to separate parallel reasoning steps while maintaining compatibility with causal attention during training.
  • Multiverse Engine enables parallel inference with a scheduler that dynamically switches between sequential and parallel generation.
  • After fine-tuning with 1K examples, Multiverse-32B, an open-source non-AR model, achieves performance on par with leading AR-LLMs of the same scale.
  • Budget control experiments demonstrate Multiverse-32B's superior scaling, outperforming AR-LLMs by 1.87% on average using the same context length.
  • Multiverse-32B also achieves up to 2x speedup across varying batch sizes, leading to practical efficiency gains.
  • The entire Multiverse ecosystem, including data, model weights, engine, and tools, has been open-sourced for accessibility.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app