The researchers compared MoRA and LoRA models on various tasks and found that MoRA significantly outperformed LoRA in memorization tasks and came close to the performance of a fully fine-tuned model with fewer parameters and training steps. In instruction tuning and mathematical reasoning tasks, MoRA's performance was almost on par with LoRA. However, for continual pretraining in biomedical and financial domains, MoRA outperformed LoRA. The researchers have released an open-source implementation of MoRA, which could be a valuable tool for enterprise applications.
Key takeaways:
- Researchers from Microsoft and Beihang University have introduced a new technique for fine-tuning large language models (LLMs) called MoRA, which is more cost-effective and addresses limitations of other techniques like LoRA.
- MoRA uses a square matrix instead of low-rank matrices, enabling it to learn new knowledge more effectively than a LoRA model of the same size.
- In tests, MoRA significantly outperformed LoRA on memorization tasks and performed almost on par with LoRA on instruction tuning and mathematical reasoning tasks.
- The researchers have released an open-source implementation of MoRA, which could be a valuable tool for enterprise applications that want to add new knowledge to base models.