menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Robustly I...
source image

Arxiv

2d

read

23

img
dot

Image Credit: Arxiv

Robustly Improving LLM Fairness in Realistic Settings via Interpretability

  • Large language models (LLMs) are being used in high-stakes hiring applications, impacting people's careers.
  • Simple anti-bias prompts may not be effective when realistic contextual details are introduced.
  • Internal bias mitigation strategies are proposed to identify and neutralize sensitive attribute directions within model activations.
  • Robust bias reduction was achieved across various models by neutralizing sensitive attribute directions.
  • Realistic contexts such as company names and culture descriptions can induce racial and gender biases in models.
  • Models show biases in favor of Black and female candidates when realistic context is introduced.
  • Inference biases can also occur based on subtle cues like college affiliations.
  • Internal bias mitigation strategies involve applying affine concept editing at inference time to reduce biases.
  • The intervention consistently reduces bias levels to very low percentages while maintaining model performance.
  • Practitioners using LLMs for hiring should consider more realistic evaluation methods and internal bias mitigation for fair outcomes.

Read Full Article

like

1 Like

For uninterrupted reading, download the app