Recent advancements in large language models have improved reasoning abilities through search and backtracking techniques.
Sequential search, enabled by backtracking, allows linearized exploration via long chain-of-thought generation.
Parallel sampling with best-of-n selection is an alternative approach to scaling test-time compute.
Comparative analysis on reasoning tasks shows that while sequential search outperforms parallel sampling on Sudoku, it underperforms on CountDown, indicating the limitations of backtracking.