Modern Large Language Model (LLM) systems rely on Retrieval Augmented Generation (RAG) to gather useful context for response generation.
Maximizing context relevance alone in RAG systems can result in degraded downstream response quality.
The evaluation of existing RAG methods shows that they scale poorly with inference time compute usage.
Introducing "RErank BEyond reLevance (REBEL)" enables RAG systems to scale by using multi-criteria optimization for higher relevance and superior answer quality.