Tokenized words can be further processed for tasks like converting all words to lowercase, removing stop words, and stemming or lemmatization.Tokenized text is often converted into numerical form for machine learning models to understand.Optimization via Expectation-Maximization (EM) is used to compute the probability of each possible tokenization for a given sentence.The tokenizer can be trained on a custom dataset and saved as a json for further use.