<ul data-eligibleForWebStory="false">Large language models see a 39% accuracy drop in multi-turn conversations.The drop is due to prompt contradictions, artificial behavior, and context handling.Simple fixes can recover lost accuracy without fine-tuning the model.In-depth analysis reveals flaws in prompt design and simulation environment.Improvements in prompt clarity and context management enhance model performance.