<ul><li>Large language models (LLMs) can exhibit biases but may output less biased answers in a multi-turn conversation when observing prior answers to the same question.</li><li>Researchers tested LLMs on a set of questions in different categories and found that they can 'de-bias' themselves in response to questions seeking random, unbiased answers.</li><li>A new metric called B-score has been proposed to detect biases in answers to subjective, random, easy, and hard questions, improving verification accuracy of LLM answers.</li><li>The B-score metric showed significant improvements in verifying LLM answers compared to using verbalized confidence scores or single-turn answers alone.</li></ul>

B-score: Detecting biases in large language models using response history

Discover more