loss can't decay during training hunyuan lora #89

qingyuan18 · 2025-02-18T03:59:11Z

FYI, I have finished the hunyuan lora training , all things good , but the loss is quite high:

Msteps: 100%|██████████| 3720/3720 [5:40:44<00:00, 5.50s/it, avr_loss=2.77]saving checkpoint: /home/ubuntu/ComfyUI/output/lora_hunyuan/hunyuan-aorun-000001.safetensors

my hparameters is as following:

cd /home/ubuntu/musubi-tuner
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 hv_train_network.py \
    --dit /home/ubuntu/hunyuan_ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt \
    --dataset_config /home/ubuntu/musubi-tuner/hunyuan_lora.toml --xformers --split_attn --mixed_precision bf16 --fp8_base \
    --optimizer_type adamw8bit --learning_rate 1e-5 --gradient_checkpointing  \
    --max_data_loader_n_workers 4 --persistent_data_loader_workers  \
    --network_module networks.lora --network_dim 32  \
    --timestep_sampling shift --discrete_flow_shift 7.0 \
    --max_train_epochs 6 --save_every_n_epochs 1 --seed 10086 \
    --output_dir /home/ubuntu/ComfyUI/output/lora_hunyuan --output_name hunyuan-aorun




is there any tricks which I can deep dive into the loss issue?

Thanks a lot

The text was updated successfully, but these errors were encountered:

Sarania · 2025-02-18T16:40:00Z

Your LR is really low for that number of steps. Usually my loss starts around .2 and by the end reaches around .1, at least with timestep sampling shift and flow shift 7.0. I usually run for about 2000 steps at 2e-4 so for 1e-5 you would need significantly more I think. I haven't seen anyone have much success below 5e-5 fwiw! That said - how is your output? Loss is not the be all end all metric, for instance if I train with timestep sampling sigmoid, flow shift 1(good for characters), loss doesn't really decrease but the target is still learned.

qingyuan18 · 2025-02-19T01:58:34Z

Your LR is really low for that number of steps. Usually my loss starts around .2 and by the end reaches around .1, at least with timestep sampling shift and flow shift 7.0. I usually run for about 2000 steps at 2e-4 so for 1e-5 you would need significantly more I think. I haven't seen anyone have much success below 5e-5 fwiw! That said - how is your output? Loss is not the be all end all metric, for instance if I train with timestep sampling sigmoid, flow shift 1(good for characters), loss doesn't really decrease but the target is still learned.

hi Sarania
Thanks for your quick reply! These information really useful
acually have tried to set the LR higher and more echpos but the loss still over 2+, I havn't tried the lora model in my workflow as it show loss so high( compared to my previously flux lora , which is 0.15 around ), I used 30+ images captions which use joyCaption to tagging, will try to higher LR more to see the difference
what's your loss avg for timestep sampling sigmoid, flow shift 1? is it also high as 2+? or a little bit 0.2 but not decay?

Thanks again

Sarania · 2025-02-23T15:15:35Z

Sorry it took me a minute to reply I've had a rough week, this is my loss graph for a recent run with sigmoid/1:

versus the same dataset and everything but with shift/7:

Both of these had LR 1e-4 for 3600 steps and the sigmoid run turned out better, which seems to be the case any time I train on images only.

qingyuan18 · 2025-02-24T12:44:28Z

Sorry it took me a minute to reply I've had a rough week, this is my loss graph for a recent run with sigmoid/1:

versus the same dataset and everything but with shift/7:

Both of these had LR 1e-4 for 3600 steps and the sigmoid run turned out better, which seems to be the case any time I train on images only.

Thanks for you continually support&suggestion

my loss shrink between the initial 2.2 and the final 1.06, but never reached 0.2-0.1 range. Various samples have been tried (sigmoid/shift, flow shift 1-7), which is too frustrating...

This is an example of my dataset's image and caption. I tried both long text captions (512 tokens) and as short captions as possible (70 tokens, as shown in this example), but the loss did not converge

caption txt: "aorun,pale-skinned woman wearing a form-fitting red dress adorned with floral patterns, set against a plain gray background."

could you give me same samples of your tagging image & .txt captions? I guess my issue refers to datasets eventually

Sarania · 2025-02-24T14:14:54Z

My captions aren't special really. For image datasets I tend to use fairly simple natural language captions that first describe the subject I'm training, then the background. For instance:

"photo of ***** sitting in the grass with her legs crossed, wearing a white dress and smiling. Her hair is curled and she's wearing brown boots. In the background are bushes, trees, and a building, all slightly blurred."

"photo of ***** in a pink bra and jean shorts posing with her arms up. Behind her is a table and a mirror and on the table are various objects including a hairbrush."

(The ****s are just censored personal names). I usually use Florence2 or Joycaption to autocaption them, then refine them manually. For videos I do everything manually and tend to be more verbose, making sure to mention the action or movement in the first sentence because that's what gets weighted most heavily. If you're training a character it definitely helps to use their name in context like "photo of Name standing in the park" etc.

If you're having trouble getting any results at all no matter how you switch the settings around, it's possible something in your dataset is wrecking your gradients. I've had that happen before and sometimes it can be hard to weed out so if you aren't getting any results at all I'd recommend trying with another dataset and seeing how that goes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss can't decay during training hunyuan lora #89

loss can't decay during training hunyuan lora #89

qingyuan18 commented Feb 18, 2025

Sarania commented Feb 18, 2025

qingyuan18 commented Feb 19, 2025

Sarania commented Feb 23, 2025 •

edited

Loading

qingyuan18 commented Feb 24, 2025

Sarania commented Feb 24, 2025 •

edited

Loading

loss can't decay during training hunyuan lora #89

loss can't decay during training hunyuan lora #89

Comments

qingyuan18 commented Feb 18, 2025

Sarania commented Feb 18, 2025

qingyuan18 commented Feb 19, 2025

Sarania commented Feb 23, 2025 • edited Loading

qingyuan18 commented Feb 24, 2025

Sarania commented Feb 24, 2025 • edited Loading

Sarania commented Feb 23, 2025 •

edited

Loading

Sarania commented Feb 24, 2025 •

edited

Loading