<ul><li>DRG-Sapphire uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes to improve accuracy and explainability.</li><li>The model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and provides physician-validated reasoning for DRG assignments.</li><li>RL performance for out-of-distribution (OOD) tasks like DRG coding scales with the logarithm of supervised fine-tuning (SFT) examples, indicating the importance of domain knowledge in the base model.</li><li>Scaling supervised fine-tuning may be more effective and computationally efficient than scaling RL alone for knowledge-intensive OOD tasks such as DRG coding.</li></ul>

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Discover more