This paper introduces RiskData, a dataset curated for enhancing embedding models in risk management.
They also introduce RiskEmbed, a finetuned embedding model designed to improve retrieval accuracy in financial question-answering systems.
The dataset is derived from 94 regulatory guidelines published by the Office of the Superintendent of Financial Institutions from 1991 to 2024.
Experimental results show that RiskEmbed outperforms general-purpose and financial embedding models, achieving significant improvements in ranking metrics.