<ul data-eligibleForWebStory="false"><li>Large Language Models (LLMs) are being used for reranking tasks in information retrieval with high performance but face deployment challenges due to computational demands.</li><li>Existing studies on LLM-based rerankers' efficiency use metrics like latency and token count, but they do not adequately consider model size and hardware variations.</li><li>A new metric called E^2R-FLOPs is proposed to evaluate LLM-based rerankers, focusing on relevance per compute (RPP) and queries per PetaFLOP (QPP) for hardware-agnostic throughput.</li><li>Comprehensive experiments were conducted using the new metrics to assess the efficiency-effectiveness trade-off of various LLM-based rerankers, shedding light on this issue in the research community.</li></ul>

Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers

Discover more