Alibaba researchers have announced Marco-o1, a large reasoning model(LRM) that applies advanced reasoning techniques to solve open-ended problems.
Marco-o1 is a fine-tuned version of Alibaba's Qwen2-7B-Instruct and uses techniques such as Monte Carlo Tree Search(MCTS) and Reasoning Action Strategies.
The model uses MCTS to explore various reasoning paths as it generates response tokens.
Marco-o1 also uses a flexible reasoning action strategy that allows for adjusting the granularity of MCTS by defining token numbers generated at each node.
The introduction of a reflection mechanism allows the model to identify potential reasoning errors in its thought process.
The researchers conducted experiments on several tasks, including multilingual grade school math problems and translated colloquial phrases/slang expressions.
Marco-o1 significantly outperformed the base Qwen2-7B model, particularly when the MCTS component was adjusted for single-token granularity.
The model is designed to help with scenarios that require deep contextual understanding and do not have well-defined metrics.
A partial reasoning dataset accompanies the Marco-o1 release on Hugging Face.
The open-source community is also catching up with the private model market, releasing models and datasets that take advantage of inference-time scaling laws.