<ul><li>Large Language Models, particularly reasoning models, have shown improved abilities in advanced problem-solving domains like mathematics and software engineering.</li><li>A novel benchmark called ChemIQ was created to assess reasoning models in directly performing chemistry tasks without external assistance.</li><li>Reasoning models like OpenAI's o3-mini correctly answered 28%-59% of questions on the ChemIQ benchmark, with higher reasoning levels boosting performance.</li><li>These models surpassed non-reasoning model GPT-4o, demonstrating capabilities such as converting SMILES strings to IUPAC names and elucidating structures from NMR data, showcasing advanced chemical reasoning abilities.</li></ul>

Assessing the Chemical Intelligence of Large Language Models

Discover more