<ul><li>Modern Large Language Model (LLM) systems rely on Retrieval Augmented Generation (RAG) to gather useful context for response generation.</li><li>Maximizing context relevance alone in RAG systems can result in degraded downstream response quality.</li><li>The evaluation of existing RAG methods shows that they scale poorly with inference time compute usage.</li><li>Introducing "RErank BEyond reLevance (REBEL)" enables RAG systems to scale by using multi-criteria optimization for higher relevance and superior answer quality.</li></ul>

Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking

Discover more