Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training produced lora with zero effect #66

Open
vladislav-kostin opened this issue Jan 28, 2025 · 4 comments
Open

Training produced lora with zero effect #66

vladislav-kostin opened this issue Jan 28, 2025 · 4 comments

Comments

@vladislav-kostin
Copy link

vladislav-kostin commented Jan 28, 2025

Currently I'm successfully training on a fork of musubi tuner with some basic gui but I might switch to this repo. However when I tried training with this repo, it made a lora that had no effect. In both cases I used the same images and even the same cache generated with this repo, same settings.

Why did training with this command produced an ineffective lora, while training in gui version with same data and settings produced the correct lora? I couldn't find a difference between this and the command issued by gui, but I'm not sure. Is something wrong with it?:

accelerate launch ^
    --num_cpu_threads_per_process 1 ^
    --mixed_precision bf16 ^ 
    hv_train_network.py ^
    --dit ./models/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt ^
    --dataset_config ./training/test/config/config.toml ^
    --sdpa ^
    --mixed_precision bf16 ^
    --fp8_base ^
    --optimizer_type adamw8bit ^
    --learning_rate 2e-4 ^
    --gradient_checkpointing ^
    --max_data_loader_n_workers 2 ^
    --persistent_data_loader_workers ^
    --network_module networks.lora ^
    --network_dim 32 ^
    --gradient_accumulation_steps 4 ^
    --timestep_sampling shift ^
    --discrete_flow_shift 7.0 ^
    --max_train_epochs 100 ^
    --save_every_n_epochs 1 ^
    --seed 69 ^
    --output_dir ./training/test/output ^
    --output_name lora
    --blocks_to_swap 36
@kohya-ss
Copy link
Owner

kohya-ss commented Jan 30, 2025

To use the trained LoRA in ComfyUI etc., you need to convert the LoRA after training.
https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#convert-lora-to-another-format

This may have been done automatically in GUI etc.

If we convert the trained LoRA by default, it will not be readable by other scripts in this repository, so we do not plan to convert it by default. Please ask the inference tool to support LoRA from Musubi Tuner.

@vladislav-kostin
Copy link
Author

This may have been done automatically in GUI etc.

Oh.. I haven't thought of that! Might be. If this is how unconverted lora expected to behave than it is definitely it! Thanks.

By the way what should I expect if I continue training by loading weights from converted lora and from diffusion pipe loras? I have done both, and it seems fine, but I'm not sure.

@kohya-ss
Copy link
Owner

In principle, it should not be a problem. However, since alpha will be the same as dim(rank) in the converted LoRA, the learning rate should be lower than when training from scratch with Musubi Tuner.

@vim-brigant
Copy link

vim-brigant commented Feb 4, 2025

I had the same issue unfortunately. In my last attempt I trained for about 4000 steps. After the .safetensors file in outputs failed to have any effect, I ran the conversion script and then used the output in ComfyUI but that did not change anything. I also generated videos using the script provided in the inference section in the readme. I ran with or without the lora I trained and saw no difference, even multiplying by 2 or 4. It's possible or likely that I need to train more, however a complete lack of effect makes it seem like something is wrong. I previously did another run using video which ran for about 48 hours on a 3090, about 128 epochs, and I had the same issue there. I ran a similar dataset in diffusion-pipe previously, only for 2 epochs, and was able to see some change after that.

If I paste the info specific to my training attempt, is it possible to see if I used the wrong settings at some point?

Thank you for taking a look if you're able.

# config.toml

[general]
resolution = [384, 512]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false

[[datasets]]
image_directory = "/home/vb/Documents/musubi-tuner/dataset/dataset_1/images"
cache_directory = "/home/vb/Documents/musubi-tuner/dataset/dataset_1/cache"
num_repeats = 1
python cache_latents.py \
    --dataset_config /home/vb/Documents/musubi-tuner/dataset/dataset_1/config.toml \
    --vae /home/vb/Documents/HunyuanVideo/ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt \
    --vae_chunk_size 32 \
    --vae_tiling

python cache_text_encoder_outputs.py \
    --dataset_config /home/vb/Documents/musubi-tuner/dataset/dataset_1/config.toml \
    --text_encoder1 /home/vb/Documents/HunyuanVideo/ckpts/text_encoder \
    --text_encoder2 /home/vb/Documents/HunyuanVideo/ckpts/text_encoder_2 \
    --batch_size 64

^ Unfortunately I lost the output from these commands but they seemed to run as expected.

musubi-tuner git:(main) ✗ LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/usr/local/cuda-12.8/compat /home/vb/Documents/musubi-tuner/venv/bin/accelerate launch \
    --num_cpu_threads_per_process 1 \
    --mixed_precision bf16 hv_train_network.py \
    --dit /home/vb/Documents/HunyuanVideo/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
    --dataset_config /home/vb/Documents/musubi-tuner/dataset/dataset_1/config.toml \
    --sdpa --mixed_precision bf16 --fp8_base \
    --optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing \
    --max_data_loader_n_workers 2 --persistent_data_loader_workers \
    --network_module networks.lora --network_dim 32 \
    --timestep_sampling shift --discrete_flow_shift 7.0 \
    --max_train_epochs 2048 --save_every_n_epochs 16 --seed 42 \
    --output_dir /home/vb/Documents/musubi-tuner/dataset/dataset_1/outputs \
    --output_name dataset_1 \
    --blocks_to_swap 20 --mixed_precision fp16 --split_attn --xformers \
    --save_state --save_state_on_train_end

Trying to import sageattention
Successfully imported sageattention
INFO:__main__:Load dataset config from /home/vb/Documents/musubi-tuner/dataset/dataset_1/config.toml
INFO:dataset.image_video_dataset:glob images in /home/vb/Documents/musubi-tuner/dataset/dataset_1/images
INFO:dataset.image_video_dataset:found 42 images
INFO:dataset.config_utils:[Dataset 0]
  is_image_dataset: True
  resolution: (384, 512)
  batch_size: 1
  num_repeats: 1
  caption_extension: ".txt"
  enable_bucket: True
  bucket_no_upscale: False
  cache_directory: "/home/vb/Documents/musubi-tuner/dataset/dataset_1/cache"
  debug_dataset: False
    image_directory: "/home/vb/Documents/musubi-tuner/dataset/dataset_1/images"
    image_jsonl_file: "None"


INFO:dataset.image_video_dataset:bucket: (336, 576), count: 1
INFO:dataset.image_video_dataset:bucket: (400, 480), count: 1
INFO:dataset.image_video_dataset:bucket: (416, 464), count: 2
INFO:dataset.image_video_dataset:bucket: (432, 448), count: 25
INFO:dataset.image_video_dataset:bucket: (448, 432), count: 4
INFO:dataset.image_video_dataset:bucket: (480, 400), count: 2
INFO:dataset.image_video_dataset:bucket: (512, 384), count: 3
INFO:dataset.image_video_dataset:bucket: (528, 368), count: 2
INFO:dataset.image_video_dataset:bucket: (544, 352), count: 1
INFO:dataset.image_video_dataset:bucket: (576, 336), count: 1
INFO:dataset.image_video_dataset:total batches: 42
INFO:__main__:preparing accelerator
accelerator device: cuda
INFO:__main__:DiT precision: torch.bfloat16, weight precision: torch.float8_e4m3fn
INFO:__main__:Loading DiT model from /home/vb/Documents/HunyuanVideo/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
Using torch attention mode, split_attn: True
INFO:__main__:enable swap 20 blocks to CPU from device: cuda
HYVideoDiffusionTransformer: Block swap enabled. Swapping 20 blocks, double blocks: 10, single blocks: 21.
import network module: networks.lora
INFO:networks.lora:create LoRA network. base dim (rank): 32, alpha: 1
INFO:networks.lora:neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
INFO:networks.lora:create LoRA for U-Net/DiT: 240 modules.
INFO:networks.lora:enable LoRA for U-Net: 240 modules
HYVideoDiffusionTransformer: Gradient checkpointing enabled.
prepare optimizer, data loader etc.
INFO:__main__:use 8-bit AdamW optimizer | {}
override steps. steps for 2048 epochs is / 指定エポックまでのステップ数: 86016
INFO:__main__:casting model to torch.float8_e4m3fn
running training / 学習開始
  num train items / 学習画像、動画数: 42
  num batches per epoch / 1epochのバッチ数: 42
  num epochs / epoch数: 2048
  batch size per device / バッチサイズ: 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 86016
INFO:__main__:calculate hash for DiT model: /home/simonsays/Documents/HunyuanVideo/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
steps:   0%|                                                                     | 0/86016 [00:00<?, ?it/s]INFO:__main__:DiT dtype: torch.float8_e4m3fn, device: cuda:0


... a few hours later


INFO:dataset.image_video_dataset:epoch is incremented. current_epoch: 99, epoch: 100
steps:   5%|█▉                                     | 4196/86016 [2:35:52<50:39:25,  2.23s/it, avr_loss=nan]^CTraceback (most recent call last):
  File "/home/vb/Documents/musubi-tuner/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/vb/Documents/musubi-tuner/venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/vb/Documents/musubi-tuner/venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1172, in launch_command
    simple_launcher(args)
  File "/home/vb/Documents/musubi-tuner/venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 759, in simple_launcher
    process.wait()
  File "/home/vb/miniconda3/lib/python3.12/subprocess.py", line 1264, in wait
    return self._wait(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vb/miniconda3/lib/python3.12/subprocess.py", line 2053, in _wait
    (pid, sts) = self._try_wait(0)
                 ^^^^^^^^^^^^^^^^^
  File "/home/simonsays/miniconda3/lib/python3.12/subprocess.py", line 2011, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants