Late chunking is a query-driven segmentation technique that allows more flexible and dynamic document segmentation at retrieval time based on the query.
Late chunking provides distinct advantages over traditional early chunking methods, including better contextual awareness, reduced indexing overhead, better query adaptability, and improved performance of language and learning models (LLMs).
Optimizations to enhance the efficiency of late chunking include efficient embedding retrieval, adaptive windowing, vector pruning, parallelized late chunking, and re-ranking with LLMs.
Late chunking is particularly effective in domains such as enterprise knowledge management, legal document search, medical Q&A systems, technical support chatbots, and scientific research assistants.