menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Marconi: P...
source image

Arxiv

1w

read

411

img
dot

Image Credit: Arxiv

Marconi: Prefix Caching for the Era of Hybrid LLMs

  • Hybrid models that combine the language modeling capabilities of Attention layers with the efficiency of Recurrent layers have gained traction for supporting long contexts in Large Language Model serving.
  • Marconi is a system that supports efficient prefix caching with Hybrid LLMs.
  • Marconi uses novel admission and eviction policies that assess potential cache entries based on recency, reuse likelihood, and compute savings.
  • Marconi achieves significantly higher token hit rates compared to state-of-the-art prefix caching systems.

Read Full Article

like

24 Likes

For uninterrupted reading, download the app