menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

tokenizations
source image

Medium

1M

read

418

img
dot

Image Credit: Medium

tokenizations

  • Tokenized words can be further processed for tasks like converting all words to lowercase, removing stop words, and stemming or lemmatization.
  • Tokenized text is often converted into numerical form for machine learning models to understand.
  • Optimization via Expectation-Maximization (EM) is used to compute the probability of each possible tokenization for a given sentence.
  • The tokenizer can be trained on a custom dataset and saved as a json for further use.

Read Full Article

like

25 Likes

For uninterrupted reading, download the app