I trained a small transformer model from scratch!

A naukri.com initiative

New

I trained ...

Medium

395

Image Credit: Medium

The author trained a Large Language Model (LLM) from scratch using a small dataset of Telugu lyrics.
The model was scaled down to 6.58M parameters and trained for next token prediction.
Training the model took approximately 53 minutes per epoch and a total of 9 hours and 30 minutes.
Despite overfitting to the training data, the model generated decent Telugu words during inference.

Read Full Article

23 Likes

For uninterrupted reading, download the app