Automated question answering (QA) over electronic health records (EHRs) is crucial for providing information to clinicians and patients.
Neural, a method developed for evidence-grounded clinical QA, was the runner-up in the BioNLP 2025 ArchEHR-QA shared task.
Neural's approach involves separating the task into sentence-level evidence identification and answer synthesis with explicit citations.
The method utilized DSPy's MIPROv2 optimizer to explore the prompt space and fine-tune instructions and few-shot demonstrations on the development set.
A self-consistency voting scheme was employed to enhance evidence recall without compromising precision.
On the hidden test set, Neural achieved an overall score of 51.5, ranking second while surpassing standard zero-shot and few-shot prompting by significant margins.
The results suggest that data-driven prompt optimization is a more cost-effective method than model fine-tuning for high-stakes clinical QA.
This advancement can enhance AI assistants' reliability in healthcare settings.