You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rank0]: File "/data/anaconda/envs/ENV_NAME/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2726, in broadcast
[rank0]: work = group.broadcast([tensor], opts)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2588, internal error - please report this issue to the NCCL developers, NCCL version 2.21.5
[rank0]: ncclInternalError: Internal check failed.
[rank0]: Last error:
[rank0]: Bootstrap : no socket interface found
It is resulting from the NCCL_SOCKET_IFNAME=eth0. The authors should make the socket configuration according to the user's ifconfig.
The text was updated successfully, but these errors were encountered:
I've met the error:
[rank0]: File "/data/anaconda/envs/ENV_NAME/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2726, in broadcast
[rank0]: work = group.broadcast([tensor], opts)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2588, internal error - please report this issue to the NCCL developers, NCCL version 2.21.5
[rank0]: ncclInternalError: Internal check failed.
[rank0]: Last error:
[rank0]: Bootstrap : no socket interface found
It is resulting from the
NCCL_SOCKET_IFNAME=eth0
. The authors should make the socket configuration according to the user'sifconfig
.The text was updated successfully, but these errors were encountered: