<ul><li>Debiasing approaches often lead to a decrease in model capabilities like accuracy and knowledge retention.</li><li>Existing debiasing methods face trade-offs resulting in reduced truthfulness, knowledge loss, or unintelligible outputs, especially in smaller models.</li><li>A contrastive learning framework is proposed to address these limitations by using positive and negative examples for learning, introducing contrast computation and dynamic loss scaling.</li><li>Experimental results show that this approach improves toxicity reduction and faithfulness preservation simultaneously, without the capability degradation seen in current methods.</li></ul>

Paying Alignment Tax with Contrastive Learning

Discover more