<ul><li>Large Language Models (LLMs) struggle with systematic reasoning, even when performing well on certain tasks.</li><li>Post-training strategies based on reinforcement learning and chain-of-thought prompting have been seen as an improvement.</li><li>Little is known about the potential of Large Reasoning Models(LRMs) beyond mathematics and programming.</li><li>LLMs and LRMs still overall perform poorly, albeit better than random chance.</li></ul>

Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models

Discover more