Large Language Models (LLMs) are being used for reranking tasks in information retrieval with high performance but face deployment challenges due to computational demands.
Existing studies on LLM-based rerankers' efficiency use metrics like latency and token count, but they do not adequately consider model size and hardware variations.
A new metric called E^2R-FLOPs is proposed to evaluate LLM-based rerankers, focusing on relevance per compute (RPP) and queries per PetaFLOP (QPP) for hardware-agnostic throughput.
Comprehensive experiments were conducted using the new metrics to assess the efficiency-effectiveness trade-off of various LLM-based rerankers, shedding light on this issue in the research community.