A new framework, SynapseRoute, has been proposed to optimize model selection in large language models for cost and performance balancing.
Approximately 58% of medical questions can be accurately answered using a low-cost, non-thinking mode without employing high-cost reasoning processes.
SynapseRoute dynamically routes queries to either the thinking or non-thinking mode based on complexity, leading to improved accuracy, cost-efficiency, and user experience.
Experimental results on medical datasets show that SynapseRoute enhances overall accuracy, reduces inference time by 36.8%, and decreases token consumption by 39.66% compared to using the thinking mode alone.