menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

How Post-T...
source image

Arxiv

20h

read

107

img
dot

Image Credit: Arxiv

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

  • Post-training is essential for the success of large language models (LLMs), transforming pre-trained base models into more useful and aligned post-trained models.
  • This paper compares base and post-trained LLMs from four perspectives to understand how post-training affects LLMs internally.
  • Findings reveal that post-training does not change factual knowledge storage locations, adapts knowledge representations from the base model, and develops new knowledge representations.
  • Truthfulness can be effectively transferred for interventions, while refusal shows limited forward transferability between the base and post-trained models.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app