A new benchmark called FinMaster has been introduced for evaluating large language models (LLMs) in financial tasks such as financial literacy, accounting, auditing, and consulting.
FinMaster consists of three main modules: FinSim for generating synthetic financial data, FinSuite offering tasks in financial domains, and FinEval for evaluation.
Experiments on state-of-the-art LLMs using FinMaster reveal critical capability gaps in financial reasoning, showing a significant drop in accuracy from basic to complex tasks.
FinMaster aims to bridge the gap between research and industry practitioners, enabling the adoption of LLMs in real-world financial workflows to improve efficiency and accuracy.