Large language models (LLMs) learn during the pre-training phase by being fed a huge amount of text to understand language rules and context.Common Crawl provides data from 250 billion web pages for pre-training, but preprocessing to remove noise is crucial.Tokenization breaks text into manageable tokens for numerical processing, with methods like Byte Pair Encoding (BPE) being common.Models like GPT-4o use subword-based tokenization to handle large vocabularies more efficiently.Training involves Next Token Prediction and Masked Language Modeling to learn language structure and relationships between tokens.Base models learn to generate text one token at a time and serve as a starting point for further fine-tuning.Base models can memorize text patterns but may struggle with reasoning tasks due to limited structured understanding.In-context memory allows base models to adjust responses based on the provided context, demonstrating versatility without fine-tuning.Base models excel in replicating text based on memorized patterns but may lack originality and deep reasoning abilities.In the pre-training phase, LLMs develop foundational skills by learning from raw data before advanced techniques are applied for post-training.