Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance on vision models like vit or stable diffusion #1

Open
zhch-sun opened this issue Jul 14, 2024 · 5 comments
Open

performance on vision models like vit or stable diffusion #1

zhch-sun opened this issue Jul 14, 2024 · 5 comments

Comments

@zhch-sun
Copy link

thanks for your awesome work!
I was wondering if you got any results on vision models like vit or stable diffusion?

@Outsider565
Copy link
Owner

Thank you for your interest in our work!

We have conducted preliminary experiments with Stable Diffusion 1.5 on COCO and style transfer datasets. While we haven't explored ViT yet, our findings with Stable Diffusion indicate that LoRA-GA converges significantly faster. However, the FID metric shows only a marginal improvement over standard LoRA. This could be attributed to the fact that both methods converge well after training for 50 epochs, with LoRA-GA demonstrating substantial improvement in the initial few epochs.

We will be updating our results in the next version on arXiv, so stay tuned for more detailed insights.

@zhch-sun
Copy link
Author

I don't know if you have ever experimented with this, but when finetuning sd lora using only a few images, would lora-ga perform better under this setting?

@Outsider565
Copy link
Owner

We have conducted preliminary experiments with Stable Diffusion 1.5 on COCO and style transfer datasets

The style transfer dataset we tried has 32-64 images per class. LoRA-GA performs better at the first 10-20 epochs (w.r.t. training loss and generated image quality), but after that, both methods seem to be similar, especially after 100 epochs.

LLM SD style transfer
Training Epochs Mostly 1 50 or more
Training Data 100k sample or 100M+ tokens 32-64 images

Here's my thought: The SD style transfer tasks are a lot easier than the LLM ones. Tuning a few images with many epochs gives LoRA many chances to gradually optimize (maybe at the first few epochs, standard LoRA is "trying to find a good initialization"), while the LLM tasks only give one chance. As a result, even with suboptimal initialization, after training for dozens of epochs, standard LoRA can converge to an optimum space.

@zhch-sun
Copy link
Author

Since your algorithm converges faster, is it possible that using your algorithm could alleviate the catastrophic forgetting problem during SFT, i.e., retain more knowledge from the pre-trained model (such as by using early stopping)

@Outsider565
Copy link
Owner

Outsider565 commented Jul 15, 2024

Maybe you can check out this paper.
I assume LoRA-GA should forget less than full-finetuning, but I have no idea whether it will forget less or more than standard LoRA. Feel free to try it in your setting!
Also if you have some good results, I'm happy to discuss it. Feel free to connect with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants