<ul data-eligibleForWebStory="true"><li>Researchers reverse-engineered a convolutional recurrent neural network (RNN) trained using model-free reinforcement learning to play the game Sokoban.</li><li>The RNN was found to solve more levels with increased test-time compute, resembling classic bidirectional search.</li><li>The RNN plans movements by representing them in activations associated with specific directions for each square.</li><li>These state-action activations function analogously to a value function, determining backtrack and plan survival during pruning.</li><li>Specialized kernels extend these activations forward and backward to create paths, forming a transition model.</li><li>The RNN deviates from classical search methods as it does not have a unified state representation; it addresses each box individually.</li><li>Each layer in the network has its own plan representation and value function, increasing search depth.</li><li>The mechanisms in the network leveraging test-time compute learned through model-free training are understandable in familiar terms.</li></ul>

Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban

Discover more