Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle.distributed.all_to_all不支持unequal_split_size的语义 #71429

Open
dynamicheart opened this issue Mar 5, 2025 · 1 comment
Open
Assignees
Labels

Comments

@dynamicheart
Copy link
Contributor

dynamicheart commented Mar 5, 2025

  1. PyTorch paddle.distributed.all_to_all unequal split size的例子(https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_to_all):
>>> input
tensor([0, 1, 2, 3, 4, 5])                                       # Rank 0
tensor([10, 11, 12, 13, 14, 15, 16, 17, 18])                     # Rank 1
tensor([20, 21, 22, 23, 24])                                     # Rank 2
tensor([30, 31, 32, 33, 34, 35, 36])                             # Rank 3
>>> input_splits
[2, 2, 1, 1]                                                     # Rank 0
[3, 2, 2, 2]                                                     # Rank 1
[2, 1, 1, 1]                                                     # Rank 2
[2, 2, 2, 1]                                                     # Rank 3
>>> output_splits
[2, 3, 2, 2]                                                     # Rank 0
[2, 2, 1, 2]                                                     # Rank 1
[1, 2, 1, 2]                                                     # Rank 2
[1, 2, 1, 1]                                                     # Rank 3
>>> input = list(input.split(input_splits))
>>> input
[tensor([0, 1]), tensor([2, 3]), tensor([4]), tensor([5])]                   # Rank 0
[tensor([10, 11, 12]), tensor([13, 14]), tensor([15, 16]), tensor([17, 18])] # Rank 1
[tensor([20, 21]), tensor([22]), tensor([23]), tensor([24])]                 # Rank 2
[tensor([30, 31]), tensor([32, 33]), tensor([34, 35]), tensor([36])]         # Rank 3
>>> output = ...
>>> dist.all_to_all(output, input)
>>> output
[tensor([0, 1]), tensor([10, 11, 12]), tensor([20, 21]), tensor([30, 31])]   # Rank 0
[tensor([2, 3]), tensor([13, 14]), tensor([22]), tensor([32, 33])]           # Rank 1
[tensor([4]), tensor([15, 16]), tensor([23]), tensor([34, 35])]              # Rank 2
[tensor([5]), tensor([17, 18]), tensor([24]), tensor([36])]                  # Rank 3
  1. Paddle只有paddle.distributed.all_to_all_single接口支持unequal split_size,paddle.distributed.all_to_all输入tensor list会被concat并均匀split

    参考:https://github.com/PaddlePaddle/Paddle/blob/974cc53f9d/paddle/fluid/pybind/distributed_py.cc#L322

  2. 如果是equal split,建议in_split_sizes和out_split_sizes直接传空:

    参考:
    https://github.com/pytorch/pytorch/blob/6c3492b4/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L4912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants