<ul><li>Large language models (LLMs) like GPT-4 predominantly operate in the cloud, incurring high operational costs.</li><li>The necessity of cloud-exclusive processing for AI agents is being reconsidered with the improved accuracy of local-based small language models (SLMs).</li><li>A lightweight scheduler called Adaptive Iteration-level Model Selector (AIMS) is proposed to partition AI agent's subtasks between SLM and LLM based on subtask features to maximize SLM usage and maintain accuracy.</li><li>Experimental results show that AIMS improves accuracy by up to 9.1% and increases SLM usage by up to 10.8% compared to existing approaches.</li></ul>

HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents

Discover more