Large language models face challenges in automated theorem proving due to sparse rewards and complex reasoning tasks.
A new framework called self-generated goal-conditioned MDPs (sG-MDPs) is introduced to tackle these challenges by allowing agents to generate and pursue subgoals in a structured manner.
Monte Carlo Tree Search (MCTS)-like algorithms are utilized to solve the sG-MDP, implemented in Bourbaki (7B) system, which utilizes multiple LLMs for subgoal generation and tactic synthesis.
Bourbaki (7B) achieves state-of-the-art results on PutnamBench by solving 26 problems, demonstrating the effectiveness of the approach in theorem proving.