performance on vision models like vit or stable diffusion #1

zhch-sun · 2024-07-14T02:42:06Z

thanks for your awesome work!
I was wondering if you got any results on vision models like vit or stable diffusion?

Outsider565 · 2024-07-14T04:43:10Z

Thank you for your interest in our work!

We have conducted preliminary experiments with Stable Diffusion 1.5 on COCO and style transfer datasets. While we haven't explored ViT yet, our findings with Stable Diffusion indicate that LoRA-GA converges significantly faster. However, the FID metric shows only a marginal improvement over standard LoRA. This could be attributed to the fact that both methods converge well after training for 50 epochs, with LoRA-GA demonstrating substantial improvement in the initial few epochs.

We will be updating our results in the next version on arXiv, so stay tuned for more detailed insights.

zhch-sun · 2024-07-15T03:21:16Z

I don't know if you have ever experimented with this, but when finetuning sd lora using only a few images, would lora-ga perform better under this setting?

Outsider565 · 2024-07-15T03:40:35Z

We have conducted preliminary experiments with Stable Diffusion 1.5 on COCO and style transfer datasets

The style transfer dataset we tried has 32-64 images per class. LoRA-GA performs better at the first 10-20 epochs (w.r.t. training loss and generated image quality), but after that, both methods seem to be similar, especially after 100 epochs.

	LLM	SD style transfer
Training Epochs	Mostly 1	50 or more
Training Data	100k sample or 100M+ tokens	32-64 images

Here's my thought: The SD style transfer tasks are a lot easier than the LLM ones. Tuning a few images with many epochs gives LoRA many chances to gradually optimize (maybe at the first few epochs, standard LoRA is "trying to find a good initialization"), while the LLM tasks only give one chance. As a result, even with suboptimal initialization, after training for dozens of epochs, standard LoRA can converge to an optimum space.

zhch-sun · 2024-07-15T06:11:37Z

Since your algorithm converges faster, is it possible that using your algorithm could alleviate the catastrophic forgetting problem during SFT, i.e., retain more knowledge from the pre-trained model (such as by using early stopping)

Outsider565 · 2024-07-15T06:22:24Z

Maybe you can check out this paper.
I assume LoRA-GA should forget less than full-finetuning, but I have no idea whether it will forget less or more than standard LoRA. Feel free to try it in your setting!
Also if you have some good results, I'm happy to discuss it. Feel free to connect with me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance on vision models like vit or stable diffusion #1

performance on vision models like vit or stable diffusion #1

zhch-sun commented Jul 14, 2024

Outsider565 commented Jul 14, 2024

zhch-sun commented Jul 15, 2024

Outsider565 commented Jul 15, 2024

zhch-sun commented Jul 15, 2024

Outsider565 commented Jul 15, 2024 •

edited

Loading

performance on vision models like vit or stable diffusion #1

performance on vision models like vit or stable diffusion #1

Comments

zhch-sun commented Jul 14, 2024

Outsider565 commented Jul 14, 2024

zhch-sun commented Jul 15, 2024

Outsider565 commented Jul 15, 2024

zhch-sun commented Jul 15, 2024

Outsider565 commented Jul 15, 2024 • edited Loading

Outsider565 commented Jul 15, 2024 •

edited

Loading