Recent advances in reinforcement learning (RL) have led to significant improvements in task performance.
Noise-based alternatives like reward-modulated Hebbian learning (RMHL) have been proposed, but their performance has been limited in scenarios with delayed rewards.
A novel noise-based learning rule has been derived, which combines directional derivative theory and Hebbian-like updates, enabling efficient, gradient-free learning in RL.
The proposed method significantly outperforms RMHL and is competitive with backpropagation-based baselines, making it suitable for low-power and real-time applications.