The AlphaZero/MuZero (A/MZ) family of algorithms utilizes Monte Carlo Tree Search (MCTS) and learned models for remarkable success in various domains.
Epistemic MCTS (EMCTS) is introduced to address the uncertainty caused by learned models and enhance exploration in sparse reward environments.
When applied to the task of writing code in the Assembly language subleq, AZ with EMCTS achieves higher sample efficiency compared to the baseline AZ.
Search with EMCTS significantly outperforms equivalent methods without search for uncertainty estimation in solving hard-exploration benchmark Deep Sea, showcasing the benefits of search for epistemic uncertainty estimation.