I've built an open source ETL framework called CocoIndex to prepare data for RAG.
CocoIndex simplifies the creation and maintenance of data indexing pipelines for AI applications such as semantic search and retrieval-augmented generation.
Key features include data flow programming, support for custom logic, and incremental updates.
The CocoIndex GitHub repository and a video tutorial are available for reference and getting started.