This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov Decision Process (MDP).
The implementation and optimization of agent-environment interactions are done entirely within the quantum domain, eliminating reliance on classical computations.
Key contributions include quantum-based state transitions, return calculation, and trajectory search mechanisms that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena.
Experimental results show the capacity of a quantum model to achieve quantum advantage in RL, highlighting the potential of fully quantum implementations in decision-making tasks.