<ul><li>Language models often struggle with cross-mode knowledge retrieval, i.e., accessing knowledge learned in one format when queried in another.</li><li>Models trained on multiple data sources show reduced accuracy when retrieving knowledge in a different format from their original training mode.</li><li>A controlled study of random token sequence memorization across different modes quantitatively investigates this limitation.</li><li>CASCADE, a novel pretraining algorithm using cascading datasets with varying sequence lengths, outperforms dataset rewriting approaches and enhances language models' cross-mode knowledge retrieval.</li></ul>

CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models

Discover more