The article explores benchmarking tabular reinforcement learning algorithms based on the seminal book by Sutton and Barto, focusing on Dynamic Programming, Monte Carlo methods, Temporal Difference Learning, and Model-Based RL/Planning.
The benchmarking framework involves evaluating algorithms on increasingly larger Gridworld environments to compare their efficiency and effectiveness in solving tasks.
Key algorithms like Q-learning, value iteration, Sarsa, and Dyna-Q are discussed and benchmarked on Gridworlds up to size 50x50.
Results indicate that Value Iteration outperforms other methods, followed by On-policy MC, Dyna-Q, Q-learning, and Sarsa-n in the context of simple, fully-known, and deterministic environments.
Performance rankings show On-policy MC as the best model-free algorithm, while Dyna-Q blends model-based planning with model-free learning for improved results.
The article discusses the trade-off between efficiency and generality in reinforcement learning algorithms, with future plans to benchmark in more challenging environments and explore function approximation in Part II of Sutton's book.
The detailed experiments, rankings, and discussions offer insights into the performance of tabular RL algorithms, showcasing the strengths and limitations of each method.