menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Deep Learning News

>

Elegant re...
source image

Medium

2w

read

388

img
dot

Image Credit: Medium

Elegant reason for scaling Dot Product between Query and Key Matrices in Transformers

  • The scaling of the dot product between the query and key matrices in transformers is done to prevent the softmax function from being peaky.
  • The softmax function is sensitive to the magnitudes of its input, and when large values are supplied, the output becomes peaky.
  • Scaling the dot product reduces the variance and stabilizes the training process in neural network architectures.
  • The scaled dot product attention mechanism helps in normalizing the attention weights.

Read Full Article

like

23 Likes

For uninterrupted reading, download the app