<ul data-eligibleForWebStory="false"><li>Large language models face challenges in automated theorem proving due to sparse rewards and complex reasoning tasks.</li><li>A new framework called self-generated goal-conditioned MDPs (sG-MDPs) is introduced to tackle these challenges by allowing agents to generate and pursue subgoals in a structured manner.</li><li>Monte Carlo Tree Search (MCTS)-like algorithms are utilized to solve the sG-MDP, implemented in Bourbaki (7B) system, which utilizes multiple LLMs for subgoal generation and tactic synthesis.</li><li>Bourbaki (7B) achieves state-of-the-art results on PutnamBench by solving 26 problems, demonstrating the effectiveness of the approach in theorem proving.</li></ul>

Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Discover more