<ul><li>A new technical paper titled “Scaling On-Device GPU Inference for Large Generative Models” was published by researchers at Google and Meta Platforms.</li><li>The paper introduces ML Drift, an optimized framework that enables on-device execution of generative AI workloads with significantly more parameters than existing models.</li><li>ML Drift addresses engineering challenges related to cross-GPU API development and ensures compatibility across mobile and desktop/laptop platforms, allowing for the deployment of complex models on resource-constrained devices.</li><li>The GPU-accelerated ML/AI inference engine developed by the researchers achieves a performance improvement of an order of magnitude compared to existing open-source GPU inference engines.</li></ul>

Inference Framework For Deployment Challenges of Large Generative Models On GPUs (Google)

Discover more