Recent large language models (LLMs) such as OpenAI’s o1/o3, DeepSeek’s R1, and Anthropic’s Claude 3.7 exhibit enhanced reasoning capabilities through deep thinking using chain-of-thought (CoT) approach.
The CoT-based test-time scaling may hit ceilings due to exceeding model context windows, burying critical information, and high self-attention complexities.
The article proposes a new reasoning paradigm named PENCIL that allows LLMs to both generate and erase thoughts for optimal reasoning efficiency.
PENCIL uses erasure mechanism inspired by logic rewriting rules and functional programming to discard intermediate thoughts when not needed.
PENCIL supports various reasoning patterns like task decomposition, branch and backtrack, and summarization/tail recursion for efficient problem-solving.
PENCIL demonstrates significant space efficiency in tasks like Boolean Satisfiability (SAT) compared to traditional CoT, improving computational resource usage.
Experimental results reveal that PENCIL outperforms CoT in inherently hard reasoning tasks such as 3-SAT, QBF, and Einstein’s Puzzle, achieving higher accuracy and faster convergence.
Theoretical analysis shows that PENCIL achieves Turing completeness with optimal time and space complexity, making it efficient for solving arbitrary computable tasks.
The proposed reasoning paradigm opens up possibilities for fine-tuning LLMs with memory-efficient capabilities, inspiring reexamination of existing reasoning models.