Improving the performance of pre-trained policies through online reinforcement learning is crucial yet challenging.
Existing online RL fine-tuning methods often require continued training with offline pretrained Q-functions for stability and performance.
A new method called PORL (Policy-Only Reinforcement Learning Fine-Tuning) has been proposed, which uses only the offline pre-trained policy for efficient online RL fine-tuning.
PORL rapidly initializes the Q-function from scratch during the online phase to avoid pessimism, achieving competitive performance with advanced offline-to-online RL algorithms.