DRG-Sapphire uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes to improve accuracy and explainability.
The model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and provides physician-validated reasoning for DRG assignments.
RL performance for out-of-distribution (OOD) tasks like DRG coding scales with the logarithm of supervised fine-tuning (SFT) examples, indicating the importance of domain knowledge in the base model.
Scaling supervised fine-tuning may be more effective and computationally efficient than scaling RL alone for knowledge-intensive OOD tasks such as DRG coding.