<ul data-eligibleForWebStory="true"><li>Large language models (LLMs) are being used in high-stakes hiring applications, impacting people's careers.</li><li>Simple anti-bias prompts may not be effective when realistic contextual details are introduced.</li><li>Internal bias mitigation strategies are proposed to identify and neutralize sensitive attribute directions within model activations.</li><li>Robust bias reduction was achieved across various models by neutralizing sensitive attribute directions.</li><li>Realistic contexts such as company names and culture descriptions can induce racial and gender biases in models.</li><li>Models show biases in favor of Black and female candidates when realistic context is introduced.</li><li>Inference biases can also occur based on subtle cues like college affiliations.</li><li>Internal bias mitigation strategies involve applying affine concept editing at inference time to reduce biases.</li><li>The intervention consistently reduces bias levels to very low percentages while maintaining model performance.</li><li>Practitioners using LLMs for hiring should consider more realistic evaluation methods and internal bias mitigation for fair outcomes.</li></ul>

Robustly Improving LLM Fairness in Realistic Settings via Interpretability

Discover more