The Large Language Models League (LLML) was a fine-tuning competition organised by Amazon Web Services (AWS) team to promote the skill of creating large language models using Llama-3-8B-Instruct.
The competition had two rounds i.e preliminary round and Grand Finale. In the preliminary round, based off the Llama-3-8B-Instruct model, participants were pitted against a Llama-3–70B-Instruct model. The top five finalists advanced to the Grand Finale on October 3rd for the showdown.
The Grand Finale had seven questions, judged by a LM (40%), a panel of five experts (40%), and audience (20%), where participants were asked to generate their model responses within 60 seconds.
The author shares his experiences based on trial-and-error fine-tuning attempts on balancing the experiments between hyperparameters tuning and dataset selection.
The author adopts a cautious approach with the dataset size by choosing the top 1,000 data points with the longest response token length, which he called SeaEval-1k for the Singapore-context instruction-response competition.
The author also experimented with Low-Rank Adaptation (LoRA) and changes in the target_modules for fine-tuning. They mainly focus on the epoch, learning_rate, lora_r, and lora_alpha hyperparameters.
Initial results suggested a correlation between increasing epoch and performance. Note that lora_alpha being 2x that of lora_r seemed to be the most commonly suggested ratio.
Prompt engineering played a crucial role in the Grand Finale. The author focused on generating long responses to maximize the LM judge's score while prioritizing less structure and more creativity on the final question.
The author emphasizes that luck played a vital role, and his insights are based on his trial-and-error fine-tuning attempts, which may not reflect universally optimal approaches.
The author thanks Gen-C Generative AI Learning Community for hosting the workshop and the AWS team for organizing and facilitating the competition.