The word2vec model, proposed in a 2013 paper by Google researchers, introduced embeddings - dense vector representation of words for language models.This article focuses on reproducing word2vec results using JAX and discusses the original C code for word2vec as well.Embeddings are dense vectors of floats that encode the meaning of words in a high-dimensional space.The word2vec architecture includes CBOW (Continuous Bag Of Words) and Continuous Skip Gram, with a focus on CBOW in this post.CBOW model teaches to predict a word from surrounding words using a window size.A JAX implementation of the word2vec model includes forward pass and loss computation.Training involves subsampling common words, creating a vocabulary, and training the model with a dataset through multiple epochs.Post-training, extracting embeddings from the model allows finding word similarities using trained weights.Modern LLMs train embedding matrices as part of larger models compared to word2vec for better task tuning and efficiency.The article provides a thorough explanation of word2vec, JAX implementation, training process, and comparison with modern text embeddings.