Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSA_STATUS_ERROR_OUT_OF_RESOURCES In rocminfo and no devices in clinfo #8

Open
tejasraman opened this issue Mar 19, 2022 · 9 comments

Comments

@tejasraman
Copy link

tejasraman commented Mar 19, 2022

I get the HSA_STATUS_ERROR_OUT_OF_RESOURCES error when I run rocminfo (ROCm 4.5.2) on my computer. A previous install on another drive worked (ROCm 4.5.0) on the same kernel (5.11). When I run clinfo, 0 devices show up under “AMD Accelerated Parallel Processing”.

I have libopenblas and libopenmpi installed already, my PCIe slot supports atomics(no kfd errors). I have the patched ROCBlas and your torch and torchvision. Torch says that there are no CUDA devices(torch treaters HIP as a CUDA device)

OS: Ubuntu 20.04
Kernel: 5.11.0-44-generic
ROCm version: 4.5.2(I originally said 5.2 which does not exist, sorry)

@tejasraman tejasraman changed the title Rocminfo generates HSA_STATUS HSA_STATUS_ERROR_OUT_OF_RESOURCES In rocminfo and no devices in clinfo Mar 19, 2022
@tejasraman tejasraman reopened this Mar 19, 2022
@xuhuisheng
Copy link
Owner

Do you mean ROCm-5.0.2?
ROCm-5.2, even ROCm-5.1 didn't release yet.

@tejasraman
Copy link
Author

tejasraman commented Mar 19, 2022 via email

@tejasraman
Copy link
Author

tejasraman commented Mar 19, 2022

I did try 5.0.2 and apparently torch has no support for it yet, got some amdhip error. Installed 4.5.2 and still having issues with clinfo and rocminfo(the HSA error)

@xuhuisheng
Copy link
Owner

you can try install rocm-4.5.0's kernel and 5.0.2's rocm-dev and rocm-libs.

@tejasraman
Copy link
Author

tejasraman commented Mar 20, 2022 via email

@tejasraman
Copy link
Author

tejasraman commented Jul 20, 2022

Finally got it working with the latest releas (5.2) sorry.
Reminds me of my messed up title.....

@xuhuisheng
Copy link
Owner

@tejasraman
It's weired that I just met HSA_STATUS_ERROR_OUT_OF_RESOURCES on my gfx803 and ubuntu-20.04.4 and hashwell with ROCm-5.2.
I have to remove dkms module for amdgpu-dkms, and with upstream kernel amdgpu module, gfx803 work fine.

@tejasraman

This comment was marked as outdated.

@tejasraman tejasraman reopened this Aug 14, 2022
@tejasraman
Copy link
Author

tejasraman commented Aug 14, 2022

@xuhuisheng I’m still having this issue:
23EF12C6-27AB-42DE-B254-7E65651438A3
Clinfo:

(I’m having the issue again so marked my old post as outdated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants