The project aims to create a seamless bilingual reading experience by aligning two editions of a book through semantic alignment.
The process involves splitting text files of Russian and Spanish editions into readable, semantically matched fragments using spaCy for smarter segmentation.
To determine the optimal chunk size for the Spanish edition, a greedy pointer-based alignment strategy is employed, breaking down long sentences into smaller parts for better alignment accuracy.
Improving alignment through embedding representation and cosine similarity, translation is used to enhance alignment across languages, resulting in better alignment scores.