menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

From Jacca...
source image

Medium

2M

read

399

img
dot

Image Credit: Medium

From Jaccard to SBERT: A Comprehensive Guide to Vector Similarity Search Techniques

  • Similarity search methods have been researched since the 1970s in the field of NLP.
  • Jaccard similarity and w-shingling are two traditional methods for text similarity search.
  • TF-IDF, BM25, and Sentence BERT are popular vector-based methods for similarity search.
  • TF-IDF is used to compare the importance of each word to a particular document by looking at a large set of documents.
  • BM25 is an optimized version of TF-IDF that uses adjustable hyperparameters to improve results.
  • Sentence BERT uses BERT to generate vector embeddings of sentences that are all of the same size and excel at semantic search.
  • All of these methods can be implemented on actual problems using libraries like FAISS or vector databases like Postgres pgvector, Qdrant, ChromaDB, and Pinecone.
  • Cosine similarity is used to compare vector embeddings of sentences and calculate the similarity between two sentences, ranging from -1 to 1.
  • Journey through this article gave an extensive knowledge of different similarity search methods.
  • The field of NLP has come a long way in text similarity search since the 1970s.

Read Full Article

like

24 Likes

For uninterrupted reading, download the app