The 2048 AI project's modular design includes separate directories for environment logic, agent policies, utility functions, and entry-point scripts, ensuring clear responsibilities separation.
The Game2048Env class is crucial, providing a simulation environment based on 2048 game rules, managing tile movements, scoring, and reward assignment.
Proximal Policy Optimization (PPO) is used, optimizing policies directly while preventing large updates to maintain training stability.
PPO agent's training involves collecting experience tuples and updating policies periodically with clipped objectives to prevent large policy shifts.
The PPO agent reached a maximum tile of 512 after 1000 episodes of training but struggled to consistently reach the 2048 tile due to sparse rewards and high outcome variance.
Beam Search, a planning-based agent, simulates possible futures with a heuristic evaluation function, avoiding the need for extensive training and generalizing effectively.
Beam Search scored significantly better than PPO in reaching higher tiles in 2048 due to its efficient search and utilization of local information.
The PPOBeamHybridAgent combines PPO and Beam Search, using PPO in early stages and transitioning to Beam Search in complex board states, showing promise in tactical scenarios.
While PPO offers flexibility and generalization, Beam Search excels in exploiting local information efficiently, making it valuable for games like 2048.
Future work could explore the effective blending of planning and learning in AI for enhanced decision-making.