This article provides an overview of embeddings with transformers, BERT, and Sentence BERT (SBERT) for LLMs and RAG pipelines.
Transformers, composed of the encoder and decoder blocks, capture the context of each token with respect to the entire sequence.
However, the attention layers only attend to the past tokens, which is fine for most tasks but not sufficient for question-answering.
BERT, based on transformers, includes both forward and backward context and incorporates bidirectional self-attention.
Sentence BERT (SBERT) treats each sentence separately, thereby enabling pre-computation of the embeddings and efficient computation of similarities as and when needed.
SBERT introduces a pooling layer after BERT to reduce computation. SBERT is fine-tuned using NLI classification objective and regression and triplet similarity objectives.
The official library for SBERT is sentence-transformer. Embedding is a crucial and fundamental step to get the RAG pipeline working at its best.
The article concludes with a simple hands-on that shows how to get embeddings of any sentence using SBERT.
Stay tuned for upcoming articles on RAG and its inner workings coupled with hands-on tutorials.