<ul data-eligibleForWebStory="false">Large language models (LLMs) can make mistakes and struggle with self-correction, leading to a 'Self-Correction Blind Spot.'Researchers introduce the Self-Correction Bench framework to measure the blind spot by injecting controlled errors at varying complexity levels.Testing 14 models revealed an average blind spot rate of 64.5%, with training data composition playing a crucial role in this limitation.Appending the word 'Wait' reduced blind spots by 89.3%, showing potential for improving the reliability and trustworthiness of LLMs.