<ul data-eligibleForWebStory="true">Test-time compute scaling aims to improve model accuracy by using additional computation at inference time.Efficient Tree Search (ETS) is proposed to address challenges in search methods for inference-time scaling.ETS uses a process reward model to generate and score potential candidates during the search process.Increasing diversity in trajectories during tree search promotes more exploration but consumes more memory.ETS promotes KV sharing by pruning redundant trajectories while maintaining necessary diversity.A linear programming cost model in ETS penalizes the number of retained nodes to promote KV cache sharing.ETS achieves a 1.8$ imes$ reduction in average KV cache size during search, leading to 1.4$ imes$ increased throughput.ETS demonstrates improved performance relative to prior state-of-the-art methods with minimal accuracy degradation.No custom kernel implementation is required for ETS, and the code is available on GitHub.