Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'at::Error' #50

Closed
YijianLiu opened this issue Sep 3, 2019 · 11 comments

Comments

@YijianLiu
Copy link

I meet an error and I really know how to solve this error! Help!!!!! Someone say,"May be your labels are out of n". But my labels is from 0 to n-1! And I need your help! Thanks!

/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [574,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 251, in
main()
File "/home/cartur/HRNet-Semantic-Segmentation/tools/train.py", line 220, in main
trainloader, optimizer, model, writer_dict)
File "/home/cartur/HRNet-Semantic-Segmentation/tools/../lib/core/function.py", line 46, in train
loss = ### losses.mean()#

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generated/../THCReduceAll.cuh:317
terminate called after throwing an instance of 'at::Error'
what(): CUDA error: invalid device pointer (CudaCachingDeleter at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCCachingAllocator.cpp:498)
frame #0: THStorage_free + 0x44 (0x7fd7638cf314 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THTensor_free + 0x2f (0x7fd76396ea1f in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: at::CUDAFloatTensor::~CUDAFloatTensor() + 0x9 (0x7fd7404d2a59 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x5d (0x7fd7656d1e7d in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #4: torch::autograd::deleteFunction(torch::autograd::Function
) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #5: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #6: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #9: std::_Sp_counted_deleter<torch::autograd::PyFunction
, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #11: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #12: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #13: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #14: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #15: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #16: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #17: torch::autograd::deleteFunction(torch::autograd::Function) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #18: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #19: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #20: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #22: std::_Sp_counted_deleter<torch::autograd::PyFunction, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #26: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #27: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #28: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #29: torch::autograd::generated::CudnnConvolutionBackward::~CudnnConvolutionBackward() + 0x73 (0x7fd7656d1e93 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #30: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #31: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #32: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #33: + 0x7674a2 (0x7fd7654d44a2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #34: + 0x19aa5e (0x55e733ac1a5e in /home/cartur/.conda/envs/CenterNet_last/bin/python)
frame #35: std::_Sp_counted_deleter<torch::autograd::PyFunction*, Decref, std::allocator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x2e (0x7fd7654d64fe in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #36: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #37: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #38: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #39: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #40: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #41: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #42: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #43: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #44: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #45: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #46: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #47: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #48: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #49: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #50: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #51: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #52: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #53: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #54: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #55: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #56: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #57: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #58: torch::autograd::generated::ThresholdBackward0::~ThresholdBackward0() + 0x62 (0x7fd7656d0ed2 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #59: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #60: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x45 (0x7fd7650f0225 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #61: torch::autograd::Function::~Function() + 0xfe (0x7fd7651be2ce in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #62: torch::autograd::generated::ThAddBackward::~ThAddBackward() + 0x3d (0x7fd7656ce8bd in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: torch::autograd::deleteFunction(torch::autograd::Function*) + 0x47 (0x7fd7654c35d7 in /home/cartur/.conda/envs/CenterNet_lyj/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

@sunke123
Copy link
Member

sunke123 commented Sep 3, 2019

Which dataset and model did you use?

@YijianLiu
Copy link
Author

Which dataset and model did you use?

I choose several images from cityscapes and I want to try whether I can begin my training successfully.
And I use the model HRNetV2-W48, I think there is just one model HRNetV2-W48, isn't?

@YijianLiu
Copy link
Author

Which dataset and model did you use?
I think I follow your steps, but it also occurs this error, can you help me? Thanks!
I just choose several images from cityscapes and then run:
python tools/train.py --cfg experiments/cityscapes/seg_hrnet_w48_train_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml
In my configure file, I change some parameters:
GPUS: (0,) WORKERS: 1 BATCH_SIZE_PER_GPU: 1

@meanmee
Copy link

meanmee commented Jan 2, 2020

How did you fix this, I faced the same problem. @YijianLiu

@tomaszkaliciak
Copy link

tomaszkaliciak commented Jan 4, 2020

How did you fix this, I faced the same problem. @YijianLiu

@meanmee Are you using custom dataloader?

@meanmee
Copy link

meanmee commented Jan 7, 2020

YES, Now I have fixed this by modifying some codes in root/lib/data/cityscape.py

@tomaszkaliciak
Copy link

YES, Now I have fixed this by modifying some codes in root/lib/data/cityscape.py

I've had got the same problem. In my case, masks contained elements with values bigger than number of classes.

@meanmee
Copy link

meanmee commented Jan 8, 2020

@tomaszkaliciak The same to me

@shuangshuangguo
Copy link

I guess there are some class labels are larger than (num_classes-1). You can print label information by print(label[label > (num-classes-1)]).

@sharatg9
Copy link

YES, Now I have fixed this by modifying some codes in root/lib/data/cityscape.py

could you please share what has been changed ?

@2017ND
Copy link

2017ND commented Mar 17, 2021

YES, Now I have fixed this by modifying some codes in root/lib/data/cityscape.py

could you please share what has been changed ?

You need to change some codes in root/lib/data/cityscape.py,including self.label_mapping and self.class_weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants