-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified memory support on GH200 Grace Hopper #2306
Comments
Hi @1tnguyen, thank you for the good pointer. I have some new questions:
I'm expecting to obtain strings of only 0s and 1s from a generalized GHZ state, but it seems that the first qubits are simulated correctly while the last ones are random. Why is this the case? |
Hi @bebora, Here is a quick update on this issue. In v0.9, we've identified and fixed: (1) Proper handling of This is related to point#2 in the above comment. Could you please let us know whether the issue with (2) Proper handling of If the
when Note: this was the primary case that we observed corrupted measurement results similar to what you saw in point#3 in the above comment. Recently, I had access to a system setup where I could reproduce the corrupted output that is not related to free memory capacity. In that same system, we've also found another issue, which we haven't observed before. Depending on your specific configuration (driver version, CUDA version, etc.), that issue may or may not happen. |
Hi @1tnguyen, thanks for the explanation and your debugging efforts. I can reply to your point#1: |
@bebora FYI, we've pushed a fix for the above issue. |
@1tnguyen I can confirm that GHZ does indeed work as intended with 34 and 35 qubits. |
Required prerequisites
Describe the bug
The NVIDIA GH200 Grace Hopper Superchip is promoted as being capable of utilizing the entire system memory for GPU tasks (NVIDIA blog). However, CUDA-Q does not use the full system memory when specifying the
nvidia
target.Steps to reproduce the bug
Create the following source file
ghz.cpp
:Compile it as follows:
nvq++ ghz.cpp -o ghz.out --target nvidia
And then run it:
33 qubits:
./ghz.out 33
✅nvidia-smi
reports a VRAM usage of about 66400MiB34 qubits:
./ghz.out 34
❌:Expected behavior
I expect the GPU to be able to use system memory when necessary and simulate up to 35/36 qubits. Memory quickly becomes a limit in quantum simulations and a possible way to increase simulated qubits would be appreciated.
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
Suggestions
I was looking at a Grace Hopper presentation from John Linford and noticed two details:
cudaMalloc
is not enough and suggests usingcudaMallocManaged
ormalloc
/mmap
. I had a look at the cuQuantum repository and I saw some occurences ofcudaMalloc
in the code, but none ofcudaMallocManaged
.Do you think GH200 systems will ever be able to fully utilize their memory for quantum simulation using CUDA-Q/cuQuantum? Would this hypothetical approach affect too much simulation performance?
The text was updated successfully, but these errors were encountered: