<ul><li>The significant computational demands of pretrained language models (PLMs) pose a challenge in efficient inference, especially in multi-tenant environments.</li><li>HMI (Hierarchical knowledge management-based Multi-tenant Inference) is introduced as a system to manage tenants with distinct PLMs resource-efficiently.</li><li>HMI utilizes hierarchical PLMs (hPLMs) by categorizing PLM knowledge into general, domain-specific, and task-specific, reducing GPU memory usage per tenant.</li><li>System optimizations like hierarchical knowledge prefetching and parallel implementations improve resource utilization and inference throughput in HMI.</li></ul>

HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models

Discover more