Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Tensor parallelism fine-tuning #931

Open
MrPanch opened this issue Feb 26, 2025 · 0 comments
Open

[Feature] Tensor parallelism fine-tuning #931

MrPanch opened this issue Feb 26, 2025 · 0 comments

Comments

@MrPanch
Copy link

MrPanch commented Feb 26, 2025

Motivation

Deepspeed provide out of box tensor parallelism. However, when I modify config, for example, internvl_chat/zero_stage3_config.json adding "model_parallelism" parameters to fine 26B model:

"model_parallel": {
"enabled": true,
"dp_world_size": 6,
"tensor_parallel_size": 6,
"pipeline_parallel_size": 1,
"cpu_offload": true
},

the utilization of gpus look like this:

Image

Based on your documentation, to serve single 26B model on 4 gpus it requires 30Gb memory per gpu, and 25806 memory per gpu for 8, so interpolating, I expect to see 28Gb memory utilization, but I get out of memory error

Image

So, even though on page it look like I can fit 26B and fine tune it with batchsize 1 and accumulated batchsize, for example 8, I face this porblem.

I am not sure, that I did everything correct. If my addition to config is wrong, please, let me know. If not: What do you think about adding tensor level parallelism.
Thank you in advance

Related resources

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant