menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Deep Learning News

>

I trained ...
source image

Medium

5d

read

395

img
dot

Image Credit: Medium

I trained a small transformer model from scratch!

  • The author trained a Large Language Model (LLM) from scratch using a small dataset of Telugu lyrics.
  • The model was scaled down to 6.58M parameters and trained for next token prediction.
  • Training the model took approximately 53 minutes per epoch and a total of 9 hours and 30 minutes.
  • Despite overfitting to the training data, the model generated decent Telugu words during inference.

Read Full Article

like

23 Likes

For uninterrupted reading, download the app