A new open-source Japanese financial benchmark called EDINET-Bench has been introduced to evaluate the performance of large language models (LLMs) on complex financial tasks like accounting fraud detection, earnings forecasting, and industry prediction.
EDINET-Bench is constructed by gathering annual reports from the past 10 years from Japan's Electronic Disclosure for Investors' NETwork (EDINET) and automatically assigning labels for evaluation tasks.
Experiments indicate that even the best LLMs struggle in performing better than logistic regression in binary classification for fraud detection and earnings forecasting using the EDINET-Bench dataset.
The study emphasizes the challenges of applying LLMs to practical financial applications and suggests the necessity for domain-specific adaptation. The dataset, benchmark construction code, and evaluation code are made publicly available for further research.