Large language models (LLMs) like GPT-4 predominantly operate in the cloud, incurring high operational costs.
The necessity of cloud-exclusive processing for AI agents is being reconsidered with the improved accuracy of local-based small language models (SLMs).
A lightweight scheduler called Adaptive Iteration-level Model Selector (AIMS) is proposed to partition AI agent's subtasks between SLM and LLM based on subtask features to maximize SLM usage and maintain accuracy.
Experimental results show that AIMS improves accuracy by up to 9.1% and increases SLM usage by up to 10.8% compared to existing approaches.