menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

From Pixel...
source image

Medium

1M

read

345

img
dot

Image Credit: Medium

From Pixels to Plans: Cracking 2048 with Reinforcement Learning and Beam Search

  • The 2048 AI project's modular design includes separate directories for environment logic, agent policies, utility functions, and entry-point scripts, ensuring clear responsibilities separation.
  • The Game2048Env class is crucial, providing a simulation environment based on 2048 game rules, managing tile movements, scoring, and reward assignment.
  • Proximal Policy Optimization (PPO) is used, optimizing policies directly while preventing large updates to maintain training stability.
  • PPO agent's training involves collecting experience tuples and updating policies periodically with clipped objectives to prevent large policy shifts.
  • The PPO agent reached a maximum tile of 512 after 1000 episodes of training but struggled to consistently reach the 2048 tile due to sparse rewards and high outcome variance.
  • Beam Search, a planning-based agent, simulates possible futures with a heuristic evaluation function, avoiding the need for extensive training and generalizing effectively.
  • Beam Search scored significantly better than PPO in reaching higher tiles in 2048 due to its efficient search and utilization of local information.
  • The PPOBeamHybridAgent combines PPO and Beam Search, using PPO in early stages and transitioning to Beam Search in complex board states, showing promise in tactical scenarios.
  • While PPO offers flexibility and generalization, Beam Search excels in exploiting local information efficiently, making it valuable for games like 2048.
  • Future work could explore the effective blending of planning and learning in AI for enhanced decision-making.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app