DynamicBench is a new benchmark designed to evaluate the ability of large language models (LLMs) to store and process up-to-the-minute data for real-time information processing in applications.
The benchmark uses a dual-path retrieval pipeline combining web searches and local report databases, requiring domain-specific knowledge for accurate responses within specialized fields.
DynamicBench assesses LLMs in scenarios with or without external documents, measuring their capacity to autonomously process recent information or utilize contextual enhancements.
Experimental results show DynamicBench outperforming GPT4o by 7.0% in document-free scenarios and 5.8% in document-assisted scenarios, with a new report generation system managing dynamic information synthesis effectively.