<ul data-eligibleForWebStory="false"><li>OmniDraft is a unified framework designed to address challenges in online deployment settings related to cross-vocabulary mismatch and latency improvements in speculative decoding.</li><li>OmniDraft allows a single draft model to work with any target model and dynamically adapt to user data by utilizing an online n-gram cache and hybrid distillation fine-tuning.</li><li>This framework is ideal for on-device Large Language Model (LLM) applications focusing on model cost, efficiency, and user customization.</li><li>OmniDraft showcases its efficacy through online learning tasks in math reasoning, coding, and text generation, demonstrating compatibility with various target models and providing speed enhancements.</li></ul>

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Discover more