<ul><li>A new open-source Japanese financial benchmark called EDINET-Bench has been introduced to evaluate the performance of large language models (LLMs) on complex financial tasks like accounting fraud detection, earnings forecasting, and industry prediction.</li><li>EDINET-Bench is constructed by gathering annual reports from the past 10 years from Japan's Electronic Disclosure for Investors' NETwork (EDINET) and automatically assigning labels for evaluation tasks.</li><li>Experiments indicate that even the best LLMs struggle in performing better than logistic regression in binary classification for fraud detection and earnings forecasting using the EDINET-Bench dataset.</li><li>The study emphasizes the challenges of applying LLMs to practical financial applications and suggests the necessity for domain-specific adaptation. The dataset, benchmark construction code, and evaluation code are made publicly available for further research.</li></ul>

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

Discover more