--split_attn increases training speed, contrary to docs #65

xmodo99 · 2025-01-26T22:38:32Z

I'm running python 3.10.3 on Linux Mint, with an RTX 3090.

Contrary to the docs --split_attn increases the speed by cutting the time/it in half

Without split attention, I'm getting 27s/it, but with split attention it's 13s/it.

Does it make logical sense that this would happen? If it does we should update the docs

The text was updated successfully, but these errors were encountered:

FurkanGozukara · 2025-01-28T14:31:03Z

Good question

kohya-ss · 2025-01-30T13:24:10Z

That's very interesting. --split_attn eliminates the need for the attention mask, so maybe that helps. Could you tell me which one you're using: spda, xformers, or flash_attn?

xmodo99 · 2025-02-01T20:42:10Z

I was using sdpa like the example in the docs. Here's my full command:

accelerate launch --num_cpu_threads_per_process 2 --mixed_precision bf16 hv_train_network.py --dit ./ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.safetensors --dataset_config ./training_config.toml --sdpa --mixed_precision bf16 --fp8_base --optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing --max_data_loader_n_workers 2 --persistent_data_loader_workers --network_module networks.lora --network_dim 32 --timestep_sampling shift --discrete_flow_shift 7.0 --max_train_epochs 100 --save_every_n_steps {sample_every_n_steps} --seed 42 --output_dir ./train_output/{lora_name}/ --output_name {lora_name} --split_attn --logging_dir ./tensorLogs --log_prefix {lora_name}_ --vae ./ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 --sample_prompts ./sample_prompts.txt --text_encoder1 ./ckpts/text_encoder/llava_llama3_fp16.safetensors --text_encoder2 ./ckpts/text_encoder_2/clip_l.safetensors --sample_every_n_steps {sample_every_n_steps}

Although understandably the sampling doesn't affect the training speed so can probably be ignored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--split_attn increases training speed, contrary to docs #65

--split_attn increases training speed, contrary to docs #65

xmodo99 commented Jan 26, 2025 •

edited

Loading

FurkanGozukara commented Jan 28, 2025

kohya-ss commented Jan 30, 2025

xmodo99 commented Feb 1, 2025

--split_attn increases training speed, contrary to docs #65

--split_attn increases training speed, contrary to docs #65

Comments

xmodo99 commented Jan 26, 2025 • edited Loading

FurkanGozukara commented Jan 28, 2025

kohya-ss commented Jan 30, 2025

xmodo99 commented Feb 1, 2025

xmodo99 commented Jan 26, 2025 •

edited

Loading