Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

A naukri.com initiative

New

Multiverse...

Arxiv

358

Image Credit: Arxiv

Researchers introduce Multiverse, a generative model that enables natively parallel generation by internalizing a MapReduce paradigm.
Multiverse operates through three stages: adaptive task decomposition, parallel subtask execution, and lossless result synthesis.
A real-world Multiverse reasoning model is created with co-design of data, algorithm, and system, facilitating rapid transfer from AR-LLMs.
Multiverse 1K is developed by converting sequential reasoning chains into structured training data using an automated pipeline.
Multiverse Attention is designed to separate parallel reasoning steps while maintaining compatibility with causal attention during training.
Multiverse Engine enables parallel inference with a scheduler that dynamically switches between sequential and parallel generation.
After fine-tuning with 1K examples, Multiverse-32B, an open-source non-AR model, achieves performance on par with leading AR-LLMs of the same scale.
Budget control experiments demonstrate Multiverse-32B's superior scaling, outperforming AR-LLMs by 1.87% on average using the same context length.
Multiverse-32B also achieves up to 2x speedup across varying batch sizes, leading to practical efficiency gains.
The entire Multiverse ecosystem, including data, model weights, engine, and tools, has been open-sourced for accessibility.

Read Full Article

21 Likes

For uninterrupted reading, download the app