<ul><li>Large Language Models (LLMs) are generating content across the web, posing a risk of diluting human-authored text.</li><li>Training models on synthetic samples can lead to model collapse, where LLMs reinforce errors and yield declining performance.</li><li>A study examines how decoding strategy affects model collapse, analyzing text characteristics, similarity to human references, and resulting model performance.</li><li>A proposed machine-generated text detector and importance sampling approach can prevent model collapse and enhance performance in LLMs like GPT-2 and SmolLM2.</li></ul>

Machine-generated text detection prevents language model collapse

Discover more