<ul><li>Recent works have proposed examining the activations produced by Large Language Models (LLMs) at inference time to assess the correctness of their answers.</li><li>These works suggest that a 'geometry of truth' can be learned, where activations for correct answers differ from those producing mistakes.</li><li>However, a limitation highlighted is that these 'geometries of truth' are task-dependent and do not transfer across different tasks.</li><li>Linear classifiers trained across distinct tasks show little similarity, even with more sophisticated approaches, as activation vectors used to classify answers form separate clusters when examined across tasks.</li></ul>

The Geometries of Truth Are Orthogonal Across Tasks

Discover more