-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance on vision models like vit or stable diffusion #1
Comments
Thank you for your interest in our work! We have conducted preliminary experiments with Stable Diffusion 1.5 on COCO and style transfer datasets. While we haven't explored ViT yet, our findings with Stable Diffusion indicate that LoRA-GA converges significantly faster. However, the FID metric shows only a marginal improvement over standard LoRA. This could be attributed to the fact that both methods converge well after training for 50 epochs, with LoRA-GA demonstrating substantial improvement in the initial few epochs. We will be updating our results in the next version on arXiv, so stay tuned for more detailed insights. |
I don't know if you have ever experimented with this, but when finetuning sd lora using only a few images, would lora-ga perform better under this setting? |
The style transfer dataset we tried has 32-64 images per class. LoRA-GA performs better at the first 10-20 epochs (w.r.t. training loss and generated image quality), but after that, both methods seem to be similar, especially after 100 epochs.
Here's my thought: The SD style transfer tasks are a lot easier than the LLM ones. Tuning a few images with many epochs gives LoRA many chances to gradually optimize (maybe at the first few epochs, standard LoRA is "trying to find a good initialization"), while the LLM tasks only give one chance. As a result, even with suboptimal initialization, after training for dozens of epochs, standard LoRA can converge to an optimum space. |
Since your algorithm converges faster, is it possible that using your algorithm could alleviate the catastrophic forgetting problem during SFT, i.e., retain more knowledge from the pre-trained model (such as by using early stopping) |
Maybe you can check out this paper. |
thanks for your awesome work!
I was wondering if you got any results on vision models like vit or stable diffusion?
The text was updated successfully, but these errors were encountered: