Recent research in hallucination detection in large language models (LLMs) has shown that LLMs' internal representations contain truthfulness hints that can be used for detector training.
However, the performance of these detectors is heavily dependent on predetermined tokens and fluctuates when working on free-form generations with varying lengths and sparse distributions of hallucinated entities.
To address this, a novel approach called HaMI is proposed, which enables robust detection of hallucinations through adaptive selection and learning of critical tokens that are most indicative of hallucinations.
Experimental results on four hallucination benchmarks demonstrate that HaMI outperforms existing state-of-the-art approaches.