Automated hallucination detection in large language models (LLMs) is analyzed in a theoretical framework.
The study establishes an equivalence between hallucination detection and language identification, concluding that detection is fundamentally impossible for most language collections if the detector is trained using only correct examples.
The use of expert-labeled feedback, including negative examples, makes automated hallucination detection possible for all countable language collections.
These findings support the importance of expert-labeled examples and feedback-based methods for reliable deployment of LLMs.