Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss can't decay during training hunyuan lora #89

Open
qingyuan18 opened this issue Feb 18, 2025 · 5 comments
Open

loss can't decay during training hunyuan lora #89

qingyuan18 opened this issue Feb 18, 2025 · 5 comments

Comments

@qingyuan18
Copy link

FYI, I have finished the hunyuan lora training , all things good , but the loss is quite high:

Msteps: 100%|██████████| 3720/3720 [5:40:44<00:00, 5.50s/it, avr_loss=2.77]saving checkpoint: /home/ubuntu/ComfyUI/output/lora_hunyuan/hunyuan-aorun-000001.safetensors

my hparameters is as following:

cd /home/ubuntu/musubi-tuner
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 hv_train_network.py \
    --dit /home/ubuntu/hunyuan_ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt \
    --dataset_config /home/ubuntu/musubi-tuner/hunyuan_lora.toml --xformers --split_attn --mixed_precision bf16 --fp8_base \
    --optimizer_type adamw8bit --learning_rate 1e-5 --gradient_checkpointing  \
    --max_data_loader_n_workers 4 --persistent_data_loader_workers  \
    --network_module networks.lora --network_dim 32  \
    --timestep_sampling shift --discrete_flow_shift 7.0 \
    --max_train_epochs 6 --save_every_n_epochs 1 --seed 10086 \
    --output_dir /home/ubuntu/ComfyUI/output/lora_hunyuan --output_name hunyuan-aorun




is there any tricks which I can deep dive into the loss issue?

Thanks a lot
@Sarania
Copy link

Sarania commented Feb 18, 2025

Your LR is really low for that number of steps. Usually my loss starts around .2 and by the end reaches around .1, at least with timestep sampling shift and flow shift 7.0. I usually run for about 2000 steps at 2e-4 so for 1e-5 you would need significantly more I think. I haven't seen anyone have much success below 5e-5 fwiw! That said - how is your output? Loss is not the be all end all metric, for instance if I train with timestep sampling sigmoid, flow shift 1(good for characters), loss doesn't really decrease but the target is still learned.

@qingyuan18
Copy link
Author

Your LR is really low for that number of steps. Usually my loss starts around .2 and by the end reaches around .1, at least with timestep sampling shift and flow shift 7.0. I usually run for about 2000 steps at 2e-4 so for 1e-5 you would need significantly more I think. I haven't seen anyone have much success below 5e-5 fwiw! That said - how is your output? Loss is not the be all end all metric, for instance if I train with timestep sampling sigmoid, flow shift 1(good for characters), loss doesn't really decrease but the target is still learned.

hi Sarania
Thanks for your quick reply! These information really useful
acually have tried to set the LR higher and more echpos but the loss still over 2+, I havn't tried the lora model in my workflow as it show loss so high( compared to my previously flux lora , which is 0.15 around ), I used 30+ images captions which use joyCaption to tagging, will try to higher LR more to see the difference
what's your loss avg for timestep sampling sigmoid, flow shift 1? is it also high as 2+? or a little bit 0.2 but not decay?

Thanks again

@Sarania
Copy link

Sarania commented Feb 23, 2025

Sorry it took me a minute to reply I've had a rough week, this is my loss graph for a recent run with sigmoid/1:

Image

versus the same dataset and everything but with shift/7:

Image

Both of these had LR 1e-4 for 3600 steps and the sigmoid run turned out better, which seems to be the case any time I train on images only.

@qingyuan18
Copy link
Author

Sorry it took me a minute to reply I've had a rough week, this is my loss graph for a recent run with sigmoid/1:

Image

versus the same dataset and everything but with shift/7:

Image

Both of these had LR 1e-4 for 3600 steps and the sigmoid run turned out better, which seems to be the case any time I train on images only.

Thanks for you continually support&suggestion

my loss shrink between the initial 2.2 and the final 1.06, but never reached 0.2-0.1 range. Various samples have been tried (sigmoid/shift, flow shift 1-7), which is too frustrating...

This is an example of my dataset's image and caption. I tried both long text captions (512 tokens) and as short captions as possible (70 tokens, as shown in this example), but the loss did not converge

Image

caption txt: "aorun,pale-skinned woman wearing a form-fitting red dress adorned with floral patterns, set against a plain gray background."

could you give me same samples of your tagging image & .txt captions? I guess my issue refers to datasets eventually

@Sarania
Copy link

Sarania commented Feb 24, 2025

My captions aren't special really. For image datasets I tend to use fairly simple natural language captions that first describe the subject I'm training, then the background. For instance:

"photo of ***** sitting in the grass with her legs crossed, wearing a white dress and smiling. Her hair is curled and she's wearing brown boots. In the background are bushes, trees, and a building, all slightly blurred."

"photo of ***** in a pink bra and jean shorts posing with her arms up. Behind her is a table and a mirror and on the table are various objects including a hairbrush."

(The ****s are just censored personal names). I usually use Florence2 or Joycaption to autocaption them, then refine them manually. For videos I do everything manually and tend to be more verbose, making sure to mention the action or movement in the first sentence because that's what gets weighted most heavily. If you're training a character it definitely helps to use their name in context like "photo of Name standing in the park" etc.

If you're having trouble getting any results at all no matter how you switch the settings around, it's possible something in your dataset is wrecking your gradients. I've had that happen before and sometimes it can be hard to weed out so if you aren't getting any results at all I'd recommend trying with another dataset and seeing how that goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants