SAGE-nano introduces text-based inverse reasoning, allowing Large Language Models (LLMs) to explain their own reasoning chains post-hoc.
The approach in SAGE-nano uses a metacognitive structure to reflect back via attention processes and generate explanations of reasoning choices.
Through testing on logical reasoning puzzles, math problems, and ethical dilemmas, SAGE-nano shows high reasoning accuracy (74.6% on AQUA-RAT) and explanation quality (92.1% human preference score).
The work on inverse reasoning aims to enhance interpretability and reasoning performance in AI systems, contributing to AI safety, education, and scientific discovery.