Recent studies have shown that large language models (LLMs) can assess relevance and support information retrieval (IR) tasks such as document ranking and relevance judgment generation.
In this paper, researchers investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
They analyze the roles of various model components and identify a multi-stage, progressive process in generating relevance judgment.
The findings provide insights into the mechanisms underlying relevance assessment in LLMs, offering implications for leveraging LLMs for information retrieval tasks.