Chain of Thought (CoT) has been utilized in large language models (LLMs) to enhance output quality.
Recent studies have shown that transformers have limits in expressive power but can effectively solve complex problems when coupled with CoT.
Existing works on CoT rely on assumptions like identical training and testing data distributions and corruption-free training data, which may not hold in real-world scenarios.
A new study is the first to rigorously examine the negative effects of data shifts on CoT performance, especially focusing on the $k$-parity problem.
The study explores the joint impact of distribution shifts and data poisoning on models trained using CoT decomposition.
Surprisingly, the research indicates that CoT can lead to decreased performance on learning parity compared to direct prediction generation.
The technical findings offer a detailed explanation for the reasons behind this phenomenon.