ELM (Ensemble of Language Models) is a novel ensemble-based approach introduced to address the bottleneck in manually extracting data from unstructured pathology reports for tumor group assignment.
ELM leverages both small language models (SLMs) and large language models (LLMs), utilizing six fine-tuned SLMs.
ELM requires five-out-of-six agreement for tumor group classification, and disagreements are arbitrated by an LLM with a curated prompt.
Evaluation shows that ELM achieves an average precision and recall of 0.94, outperforming other approaches and enhancing operational efficiencies in the British Columbia Cancer Registry.