-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA.@atomic error in GPU kernel #511
Comments
@jgreener64 can you post the whole log? |
I also cannot reproduce this on my system (1.8.1, NVIDIA 3090), latest Enzyme.jl and Enzyme proper. |
I have attached the error.txt and My setup (Julia updated since the top post) is Enzyme e452f89, Enzyme_jll 0.0.42, a NVIDIA RTX A6000 GPU and
|
Can you retry latest main |
On d37ce72 with Enzyme_jll 0.0.43 it still errors, but the error changes:
The printall error is attached. |
Oh that's much more exciting! |
I wonder if the comment in https://reviews.llvm.org/D50391 is still true.
We are trying to emit an atomic load aquire. |
Yeah, looks similar to what I had in JuliaConcurrent/Atomix.jl#33 It's weird that LLVM is trying to select AtomicLoad though. I only see RMW in the Julia code. Maybe Enzyme inserts some loads given some RMW in the user code? If so, I wonder if you can use (Note: you'd need Atomix for now since CUDA.jl uses acq_rel https://github.com/JuliaGPU/CUDA.jl/blob/0cd30cbed3d084cede39db1a9959630ddae904a1/src/device/intrinsics/atomics.jl#L43-L46) Somewhat relevant JuliaGPU/CUDA.jl#1393 |
Yeah the derivative of an atomicadd can create an atomic load. Presently we preserve the same ordering -- hence the above |
Not sure how much of workarounds you'd want to add in Enzyme, but maybe you can use fetch-and-add with 0 for load (and swap for store) when the ordering is stronger than monotonic?
Yeah, I still see the comment in the I guess LLVM needs to do whatever NVCC does with libcu++ https://godbolt.org/z/aoM6477T4 |
Interesting to see the differences to sm_60 https://godbolt.org/z/Y7Pj5G7sK |
If this is the issue, is there a way to update other software to get around it? My device is sm_86 and I am on CUDA 11.7. |
Could you try Takafumi's suggestion in #511 (comment)? |
Replacing the However |
Could you open a new issue with that and. a complete reproducer as minimal as you can get it :) |
Looking into it but running into some segfaults that have appeared with recent commits: #533. |
I am looking into a minimal example with Atomix but running into some non-Enzyme issues on the GPU so reported them at JuliaConcurrent/Atomix.jl#33. |
@jgreener64 Is there some equivalent C / CUDA code (in GROMACS, for example) we could look at to see if we can reproduce this issue there? We are trying to see if this is a Julia issue or an Enzyme issue. |
The kernels in the fastest software are more complicated, using warp reductions and clever ordering of pairs to get high speed. See for example the CUDA kernel in OpenMM, which uses some atomics: https://github.com/openmm/openmm/blob/master/platforms/cuda/src/kernels/nonbonded.cu. There are likely some simpler implementations around but I don't know of any off the top of my head. I think this issue may be solved though based on @vchuravy's comment in #576? In particular when I run that code (which differs from the top code here by using UnsafeAtomicsLLVM and |
Great! Thanks for the reference, and yes. @vchuravy and I were in a meeting discussing this and we think things "work" (tm) now, but also did not check for correctness. |
Brilliant, thanks for all the help on this. If it's helpful I can make a PR adding this as a regression test to Enzyme once JuliaGPU/CUDA.jl#1644 is in a release and #576 is fixed. |
I am on Julia 1.8.1 and Enzyme e452f89. The following works:
However it errors when I make the
forces[1, i] -= dx
lines useCUDA.@atomic
(as commented out). The truncated error message is:The text was updated successfully, but these errors were encountered: