New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[libcudacxx] Experimental try_cancel exposure #3671

Open

gonzalobg wants to merge 15 commits into NVIDIA:main from gonzalobg:try_cancel_api

Collaborator

gonzalobg commented Feb 4, 2025 •

edited

Loading

Description

This PR provides experimental low-level exposure in libcudacxx for PTX try_cancel.

While experimental, it'd be very beneficial for this exposure to be available in the CTK, hence adding it to libcu++.
We should aim to stabilize it during the CUDA 13.x cycle.

This low-level wrapper delivers reasonable functionality to end-users while hiding the gnarly synchronization and pipelining required and enabling us to deliver bug-fixes and performance improvements in the future. The functionality is exposed in a portable way, and backported to all SM's that CUDA supports, but the focus is on sm_100+ (we can deliver better performance in sm_70+ later).

This PR intentionally only covers for_each_cancelled_block.
Once that API is done, for_each_cancelled_cluster would be an analogous API to add.

Future potential extensions worth exploring:

Supporting leader thread (and block) selection to enable programs to specify the control warp / block.
Enable programs to control this API's state (i.e., its 8 bytes of shared memory).
Support user-controlled pipelining. Right now it uses a 1-stage pipeline, but we could add an int NStages = 1 later tht enables users to do, e.g., double-buffering.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.


          [libcudacxx] Experimental try_cancel exposure

803e781

gonzalobg requested review from a team as code owners

February 4, 2025 14:24

gonzalobg requested a review from griwes

February 4, 2025 14:24

copy-pr-bot bot commented Feb 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gonzalobg assigned bernhardmgruber, miscco and ahendriksen


          Update documentation

3740f0f

miscco requested changes

View reviewed changes

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated

Comment on lines 145 to 149

+              _LIBCUDACXX_BEGIN_NAMESPACE_CUDA
+              namespace experimental {
+              namespace __detail {

Collaborator

miscco Feb 4, 2025

This should rather be put into cudax rather than libcu++

Collaborator Author

gonzalobg Feb 6, 2025 •

edited

Loading

cudax is not shipped with the CTK but experimental exposure of HW features should be available in the CTK early to allow programmers to use the HW early on without having to familiarize themselves with all the details.

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated

+              namespace __detail {
+              struct __empty_t {
+                __device__ void operator()(dim3);

Collaborator

miscco Feb 4, 2025

Important: We need to at least define it

Suggested change

      
              __device__ void operator()(dim3);
          
              _CCCL_DEVICE _CCCL_HIDE_FROM_ABI void operator()(dim3) const noexcept {};

Collaborator Author

gonzalobg Feb 4, 2025 •

edited

Loading

I wanted this to trigger an error if a CUDA C++ program uses it.
Would defining it support that?

I think otherwise I can define it, and then issue a static_assert below.

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved

gonzalobg and others added 12 commits

February 4, 2025 17:01


          Update try_cancel_blocks ABI

09c48f8

Co-authored-by: Michael Schellenberger Costa <[email protected]>


          Update ABI of __cluster_get_dim

6d15626

Co-authored-by: Michael Schellenberger Costa <[email protected]>


          Use if target and provide SW fallback

4b9f6a3


          Use simple license

57dcd42


          Fix if guard

aba388b


          Guard for C++20 or newer

220cf29


          Simplify API

d2baa62


          Add tests

27a6a52


          Clarify C++20 support in docs

e1f090b


          Test main function should only run in host

7e99c76


          Rename to for_each_cancelled_block and extend docs

0b62b46


          Fix typo in docs

0202b52

Collaborator Author

gonzalobg commented Feb 7, 2025

pre-commit.ci autofix


          [pre-commit.ci] auto code formatting

a9b46bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet