LoRA is a method of optimising large language models (LLMs) by tweaking only a small part of the model instead of fine-tuning all parameters.
LoRA involves freezing the original model weights and injecting low-rank matrices to reduce compute and memory usage during training.
The researchers focused on adapting Transformer layers and achieved significant results, reducing VRAM consumption by a large margin.
LoRA demonstrates better scalability and task performance compared to other existing methods, proving the importance of optimised and efficient models in AI.