Qwen-2.5-VL-7B finetuning isssue #40

ragesh-beo · 2025-02-13T11:51:09Z

Hi, I got the following issues while finetuning Qwen-2.5-VL-Instruct.

The environment.yaml file expects transformers==4.48.0 and as far as I know, Qwen2_5_VLForConditionalGeneration cannot be imported from this version
When I updated the transformer to git+https://github.com/huggingface/transformers, it gives me an error

[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/train.py", line 224, in <module>
[rank0]:     train()
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/train.py", line 199, in train
[rank0]:     trainer.train()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2241, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3698, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3759, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/peft/peft_model.py", line 563, in forward
[rank0]:     return self.get_base_model()(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/monkey_patch_forward.py", line 222, in qwen2_5_mixed_modality_forward
[rank0]:     self.visual(torch.zeros(14903, 1176), gird_thw=torch.Tensor([[1, 98, 146]]))
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]: TypeError: Qwen2_5_VisionTransformerPretrainedModel.forward() got an unexpected keyword argument 'gird_thw'

The text was updated successfully, but these errors were encountered:

2U1 · 2025-02-13T12:18:03Z

I've wrote it in the README, you should install the correct version.
I'll check the code and the version I'm using.

ragesh-beo · 2025-02-13T12:45:31Z

Sorry, I didn't notice the README instructions earlier. However, even after installing transformers as instructed in the README, I am still encountering the error. @2U1

2U1 · 2025-02-13T12:49:20Z

git+https://github.com/huggingface/transformers/commit/9d2056f12b66e64978f78a2dcb023f65b2be2108
Could you install this version and try it again?

The version should be transformer4.49.0-dev0

ragesh-beo · 2025-02-13T12:56:41Z

Installing transformers from the commit that you mentioned also didn't worked for me. Same error. And I double checked the version of transformers, it is transformers==4.49.0-dev0

2U1 · 2025-02-13T13:19:46Z

Okay I got it it was my typo.
I pushed the code with fixed version. Could you please try again with the latest code?

ragesh-beo · 2025-02-17T06:38:56Z

Using the updated code gives me the following error

Parameter Offload: Total persistent parameters: 2683904 in 424 params
^M  0%|          | 0/618 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/train.py", line 224, in <module>
[rank0]:     train()
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/train.py", line 199, in train
[rank0]:     trainer.train()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2184, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2490, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3598, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3659, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/peft/peft_model.py", line 563, in forward
[rank0]:     return self.get_base_model()(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/monkey_patch_forward.py", line 227, in qwen2_5_mixed_modality_forward
[rank0]:     self.visual(torch.zeros(14903, 1176), grid_thw=torch.Tensor([[1, 98, 146]]))
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 110, in forward
[rank0]:     hidden_states = self.proj(hidden_states.to(dtype=target_dtype)).view(-1, self.embed_dim)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
[rank0]:     return self._conv_forward(input, self.weight, self.bias)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
[rank0]:     return F.conv3d(
[rank0]: RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
^M  0%|          | 0/618 [00:05<?, ?it/s]
[2025-02-17 06:36:16,666] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 22265

@2U1

2U1 · 2025-02-17T07:50:19Z

@ragesh-beo I don't know what exactly is changed but, you need to use zero2 for mixed-modality now.
I've updated the code to run on zero2.

Sorry for the inconvinience. I'll soon make an update for supporting zero3.

ragesh-beo · 2025-02-17T07:54:27Z

Thanks @2U1

2U1 · 2025-02-17T07:57:24Z

Let me know if the code still dosen't work

ragesh-beo · 2025-02-17T08:39:18Z

Seems like all working fine with the new setup @2U1

2U1 · 2025-02-18T01:17:57Z

@ragesh-beo I've updated the code to support zero3 with mixed-modality data.
You can now use zero3.

ragesh-beo · 2025-02-18T04:10:05Z

Thanks

ragesh-beo closed this as completed Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen-2.5-VL-7B finetuning isssue #40

Qwen-2.5-VL-7B finetuning isssue #40

ragesh-beo commented Feb 13, 2025

2U1 commented Feb 13, 2025 •

edited

Loading

ragesh-beo commented Feb 13, 2025

2U1 commented Feb 13, 2025 •

edited

Loading

ragesh-beo commented Feb 13, 2025 •

edited

Loading

2U1 commented Feb 13, 2025 •

edited

Loading

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 17, 2025

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 17, 2025

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 18, 2025

ragesh-beo commented Feb 18, 2025

Qwen-2.5-VL-7B finetuning isssue #40

Qwen-2.5-VL-7B finetuning isssue #40

Comments

ragesh-beo commented Feb 13, 2025

2U1 commented Feb 13, 2025 • edited Loading

ragesh-beo commented Feb 13, 2025

2U1 commented Feb 13, 2025 • edited Loading

ragesh-beo commented Feb 13, 2025 • edited Loading

2U1 commented Feb 13, 2025 • edited Loading

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 17, 2025

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 17, 2025

ragesh-beo commented Feb 17, 2025

2U1 commented Feb 18, 2025

ragesh-beo commented Feb 18, 2025

2U1 commented Feb 13, 2025 •

edited

Loading

2U1 commented Feb 13, 2025 •

edited

Loading

ragesh-beo commented Feb 13, 2025 •

edited

Loading

2U1 commented Feb 13, 2025 •

edited

Loading