<ul><li>Large language models (LLMs) are often inconsistent and unreliable due to hallucinations and prompt perturbations.</li><li>Different methods have been proposed to address LLM inconsistencies, with one approach being measuring the consistency of LLM responses.</li><li>Existing methods for measuring LLM consistency may not align well with human perceptions.</li><li>A new logit-based ensemble method has been proposed to estimate LLM consistency, showing promising results matching human evaluations.</li></ul>

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

Discover more