Understanding and mitigating biases in large language models (LLMs) is crucial for their use in high-stakes decision-making.
The study introduces two decision tasks, Admissions and Hiring, to assess racial bias in LLMs.
The experiment shows that Gemma 2B Instruct and LLaMA 3.2 3B Instruct have strong biases in favor of certain racial groups.
While prompt engineering fails to promote fairness, debiasing interventions based on identifying 'race subspaces' within the model activations show promise in reducing biases.