Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-2.5-VL-7B finetuning isssue #40

Closed
ragesh-beo opened this issue Feb 13, 2025 · 12 comments
Closed

Qwen-2.5-VL-7B finetuning isssue #40

ragesh-beo opened this issue Feb 13, 2025 · 12 comments

Comments

@ragesh-beo
Copy link

Hi, I got the following issues while finetuning Qwen-2.5-VL-Instruct.

  1. The environment.yaml file expects transformers==4.48.0 and as far as I know, Qwen2_5_VLForConditionalGeneration cannot be imported from this version
  2. When I updated the transformer to git+https://github.com/huggingface/transformers, it gives me an error
[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/train.py", line 224, in <module>
[rank0]:     train()
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/train.py", line 199, in train
[rank0]:     trainer.train()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2241, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3698, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3759, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/peft/peft_model.py", line 563, in forward
[rank0]:     return self.get_base_model()(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/train/Qwen2-VL-Finetune/src/training/monkey_patch_forward.py", line 222, in qwen2_5_mixed_modality_forward
[rank0]:     self.visual(torch.zeros(14903, 1176), gird_thw=torch.Tensor([[1, 98, 146]]))
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]: TypeError: Qwen2_5_VisionTransformerPretrainedModel.forward() got an unexpected keyword argument 'gird_thw'
@2U1
Copy link
Owner

2U1 commented Feb 13, 2025

I've wrote it in the README, you should install the correct version.
I'll check the code and the version I'm using.

@ragesh-beo
Copy link
Author

Sorry, I didn't notice the README instructions earlier. However, even after installing transformers as instructed in the README, I am still encountering the error. @2U1

@2U1
Copy link
Owner

2U1 commented Feb 13, 2025

git+https://github.com/huggingface/transformers/commit/9d2056f12b66e64978f78a2dcb023f65b2be2108
Could you install this version and try it again?

The version should be transformer4.49.0-dev0

@ragesh-beo
Copy link
Author

ragesh-beo commented Feb 13, 2025

Installing transformers from the commit that you mentioned also didn't worked for me. Same error. And I double checked the version of transformers, it is transformers==4.49.0-dev0

@2U1
Copy link
Owner

2U1 commented Feb 13, 2025

Okay I got it it was my typo.
I pushed the code with fixed version. Could you please try again with the latest code?

@ragesh-beo
Copy link
Author

Using the updated code gives me the following error

Parameter Offload: Total persistent parameters: 2683904 in 424 params
^M  0%|          | 0/618 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/train.py", line 224, in <module>
[rank0]:     train()
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/train.py", line 199, in train
[rank0]:     trainer.train()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2184, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2490, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3598, in training_step
[rank0]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3659, in compute_loss
[rank0]:     outputs = model(**inputs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]:     ret_val = func(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]:     loss = self.module(*inputs, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/peft/peft_model.py", line 563, in forward
[rank0]:     return self.get_base_model()(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/root/train/Qwen-vl-finetune/src/training/monkey_patch_forward.py", line 227, in qwen2_5_mixed_modality_forward
[rank0]:     self.visual(torch.zeros(14903, 1176), grid_thw=torch.Tensor([[1, 98, 146]]))
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 110, in forward
[rank0]:     hidden_states = self.proj(hidden_states.to(dtype=target_dtype)).view(-1, self.embed_dim)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
[rank0]:     return self._conv_forward(input, self.weight, self.bias)
[rank0]:   File "/opt/conda/envs/qwen2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
[rank0]:     return F.conv3d(
[rank0]: RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
^M  0%|          | 0/618 [00:05<?, ?it/s]
[2025-02-17 06:36:16,666] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 22265

@2U1

@2U1
Copy link
Owner

2U1 commented Feb 17, 2025

@ragesh-beo I don't know what exactly is changed but, you need to use zero2 for mixed-modality now.
I've updated the code to run on zero2.

Sorry for the inconvinience. I'll soon make an update for supporting zero3.

@ragesh-beo
Copy link
Author

Thanks @2U1

@2U1
Copy link
Owner

2U1 commented Feb 17, 2025

Let me know if the code still dosen't work

@ragesh-beo
Copy link
Author

Seems like all working fine with the new setup @2U1

@2U1
Copy link
Owner

2U1 commented Feb 18, 2025

@ragesh-beo I've updated the code to support zero3 with mixed-modality data.
You can now use zero3.

@ragesh-beo
Copy link
Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants