Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, and existing approaches often rely on handcrafted reasoning paths.
A novel set of partial rewards tailored for the Text-to-SQL task is proposed, which addresses the reward sparsity issue in reinforcement learning (RL).
The proposed rewards include schema-linking, AI feedback, n-gram similarity, and syntax check to enhance reasoning capabilities and generalization.
RL-only training with the proposed rewards achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT) approaches.