Large Language Models have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought prompting.
A structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL) has been proposed.
Formal STL-based constraints are defined to capture desirable temporal properties and compute robustness scores for structured, interpretable confidence estimates.
Experiments show that this approach consistently improves calibration metrics and provides more reliable uncertainty estimates than conventional methods.