Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libcudacxx] Experimental try_cancel exposure #3671

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

gonzalobg
Copy link
Collaborator

@gonzalobg gonzalobg commented Feb 4, 2025

Description

This PR provides experimental low-level exposure in libcudacxx for PTX try_cancel.

While experimental, it'd be very beneficial for this exposure to be available in the CTK, hence adding it to libcu++.
We should aim to stabilize it during the CUDA 13.x cycle.

This low-level wrapper delivers reasonable functionality to end-users while hiding the gnarly synchronization and pipelining required and enabling us to deliver bug-fixes and performance improvements in the future. The functionality is exposed in a portable way, and backported to all SM's that CUDA supports, but the focus is on sm_100+ (we can deliver better performance in sm_70+ later).

This PR intentionally only covers for_each_cancelled_block.
Once that API is done, for_each_cancelled_cluster would be an analogous API to add.

Future potential extensions worth exploring:

  • Supporting leader thread (and block) selection to enable programs to specify the control warp / block.
  • Enable programs to control this API's state (i.e., its 8 bytes of shared memory).
  • Support user-controlled pipelining. Right now it uses a 1-stage pipeline, but we could add an int NStages = 1 later tht enables users to do, e.g., double-buffering.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gonzalobg gonzalobg requested review from a team as code owners February 4, 2025 14:24
@gonzalobg gonzalobg requested a review from griwes February 4, 2025 14:24
Copy link

copy-pr-bot bot commented Feb 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
Comment on lines 145 to 149
_LIBCUDACXX_BEGIN_NAMESPACE_CUDA

namespace experimental {

namespace __detail {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should rather be put into cudax rather than libcu++

Copy link
Collaborator Author

@gonzalobg gonzalobg Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudax is not shipped with the CTK but experimental exposure of HW features should be available in the CTK early to allow programmers to use the HW early on without having to familiarize themselves with all the details.

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
namespace __detail {

struct __empty_t {
__device__ void operator()(dim3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: We need to at least define it

Suggested change
__device__ void operator()(dim3);
_CCCL_DEVICE _CCCL_HIDE_FROM_ABI void operator()(dim3) const noexcept {};

Copy link
Collaborator Author

@gonzalobg gonzalobg Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this to trigger an error if a CUDA C++ program uses it.
Would defining it support that?

I think otherwise I can define it, and then issue a static_assert below.

libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
libcudacxx/include/cuda/try_cancel Outdated Show resolved Hide resolved
@gonzalobg
Copy link
Collaborator Author

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants