menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Training L...
source image

Mit

4d

read

73

img
dot

Image Credit: Mit

Training LLMs to self-detoxify their language

  • A new method called self-disciplined autoregressive sampling (SASA) enables large language models (LLMs) to moderate their own language, avoiding toxic language without affecting fluency.
  • SASA is a decoding algorithm that can identify toxic/nontoxic subspaces within the LLM's internal representation, guiding language generation to be less toxic.
  • The system re-weights sampling probabilities for tokens based on toxicity values and proximity to a classifier boundary, promoting less toxic language output.
  • By using a linear classifier on the learned subspace of the LLM's embedding, SASA steers language generation away from harmful or biased content one token at a time.
  • The research achieved reduced toxic language generation without sacrificing fluency, showcasing SASA's effectiveness in aligning language output with human values.
  • SASA was tested on LLMs of varying sizes and datasets, significantly reducing toxic language while maintaining integrity and fairness in language generation.
  • Methods like LLM retraining and external reward models are costly and time-consuming, highlighting the efficiency and efficacy of SASA in promoting healthy language.
  • The study emphasized the importance of mitigating harmful language generation and providing guidelines for value-aligned language outputs in AI systems.
  • SASA's approach of analyzing proximity to toxic thresholds during language generation offers a practical and accessible method for improving language quality in LLMs.
  • The use of SASA in detoxifying language outputs showed promise in reducing toxicity and bias, contributing to fairer and more principled language generation.
  • The research team demonstrated that balancing language fluency and toxicity reduction is achievable with techniques like SASA, paving the way for more responsible language models.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app