The study focuses on detecting under-trained tokens in large language models through various indicators and verification techniques.The effectiveness of the indicators was highlighted, showing a high predictive nature in detecting under-trained tokens.Verification statistics and example verified tokens for different model families and tokenizer vocabulary sizes were presented in Table 1.Authors of the study are Sander Land and Max Bartolo from Cohere, and the paper is available on arxiv under CC BY-SA 4.0 DEED license.