menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

TF-IDF Vectorization
source image

Medium

1w

read

365

img
dot

TF-IDF Vectorization

  • Term Frequency (TF) indicates how often a word appears in a document relative to the total number of words in that document. Example: If a document containing 100 words mentions “apple” 5 times, then the TF for “apple” in that document is 5/100 = 0.05.
  • Inverse Document Frequency (IDF) gauges how significant a word is across all the documents in your collection. Example: If “apple” appears in 100 out of 1,000 documents, then the IDF for “apple” is log(1000/100) = log(10) = 1.
  • TF-IDF is the product of TF and IDF, indicating how important a word is in a document compared to its importance across all documents. Example: If the TF for “apple” in a document is 0.05 and the IDF for “apple” is 1, then the TF-IDF score for “apple” in that document is 0.05 * 1 = 0.05.
  • TF-IDF vectorization converts each document into a set of numbers, where each number reflects the significance of a word in that document. Example: Let’s say we have three documents: 1. Document 1: Discusses apples and bananas. 2. Document 2: Talks about apples and oranges. 3. Document 3: Focuses on bananas and oranges. Assuming our vocabulary includes only “apple”, “banana”, and “orange”, the TF-IDF vectors for these documents might look like this: — Document 1: [0.05, 0.05, 0] (since it mainly covers apples and bananas) — Document 2: [0.05, 0, 0.05] (as it mainly discusses apples and oranges) — Document 3: [0, 0.05, 0.05] (since it mainly addresses bananas and oranges).
  • TF-IDF vectorization aids in understanding word significance in documents, whether for categorization, retrieving information, or pinpointing essential keywords.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app