menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Reinforcem...
source image

Arxiv

3d

read

124

img
dot

Image Credit: Arxiv

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

  • DRG-Sapphire uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes to improve accuracy and explainability.
  • The model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and provides physician-validated reasoning for DRG assignments.
  • RL performance for out-of-distribution (OOD) tasks like DRG coding scales with the logarithm of supervised fine-tuning (SFT) examples, indicating the importance of domain knowledge in the base model.
  • Scaling supervised fine-tuning may be more effective and computationally efficient than scaling RL alone for knowledge-intensive OOD tasks such as DRG coding.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app