<ul><li>Existing benchmarks primarily assess passive reasoning abilities of large language models (LLMs), providing all necessary information.</li><li>A new benchmark called AR-Bench is introduced to evaluate LLMs' active reasoning skills by requiring interaction with external systems to acquire missing evidence.</li><li>AR-Bench comprises task families like detective cases, situation puzzles, and guessing numbers to measure performance across various reasoning challenges.</li><li>Empirical evaluation on AR-Bench shows that current LLMs struggle with active reasoning, indicating a need for advancing methodology to enhance their capabilities.</li></ul>

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

Discover more