<ul data-eligibleForWebStory="true">Large Language Models (LLMs) are surprisingly robust to structural interventions like deleting and swapping adjacent layers during inference.LLMs retain 72-95% of their original top-1 prediction accuracy without any fine-tuning after interventions.Performance degradation varies across layers: early and final layers see more degradation, while dropping middle layers has minimal impact.Observation of four stages of inference in LLMs: detokenization, feature engineering, prediction ensembling, and residual sharpening.These stages show depth-dependent computations in LLMs and are seen across different model families and sizes.