Epoch AI has released FrontierMath, a mathematical benchmark designed to evaluate advanced reasoning capabilities in AI systems.
Current AI models can solve less than 2% of FrontierMath problems, indicating a substantial gap between AI capabilities and mathematical expertise.
FrontierMath problems are extremely challenging and require extended chains of precise reasoning in various mathematical domains.
While AI models are not yet on par with human mathematicians, benchmarks like FrontierMath provide opportunities for improvement in AI reasoning abilities.