Large Language Models (LLMs) struggle with systematic reasoning, even when performing well on certain tasks.Post-training strategies based on reinforcement learning and chain-of-thought prompting have been seen as an improvement.Little is known about the potential of Large Reasoning Models(LRMs) beyond mathematics and programming.LLMs and LRMs still overall perform poorly, albeit better than random chance.