Skip to content

lxline/LoRA-GA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Separate LoRA-GA from the PEFT library.

Overview

We introduce a novel initialization method, LoRA-GA (Low Rank Adaptation with Gradient Approximation), which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. Our extensive experiments demonstrate that LoRA-GA achieves a convergence rate comparable to that of full fine-tuning (hence being significantly faster than vanilla LoRA as well as various recent improvements) while simultaneously attaining comparable or even better performance. (Left) Training loss curves of Llama 2-7B on MetaMathQA to training steps. LoRA-GA converges as quickly as full fine-tuning and outperforms LoRA. (Right) Initialization procedures used in LoRA and LoRA-GA. The key difference is that LoRA-GA initializes adapters using the eigenvectors of the gradient matrix, as opposed to random initialization with a scaling factor.

Citation

@misc{wang2024loragalowrankadaptationgradient,
    title={LoRA-GA: Low-Rank Adaptation with Gradient Approximation},
    author={Shaowen Wang and Linxi Yu and Jian Li},
    year={2024},
    eprint={2407.05000},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2407.05000},
}

About

LoRA-GA for ms-swift

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%