-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libcudacxx] Experimental try_cancel exposure #3671
base: main
Are you sure you want to change the base?
Conversation
libcudacxx/include/cuda/try_cancel
Outdated
_LIBCUDACXX_BEGIN_NAMESPACE_CUDA | ||
|
||
namespace experimental { | ||
|
||
namespace __detail { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should rather be put into cudax rather than libcu++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudax
is not shipped with the CTK but experimental exposure of HW features should be available in the CTK early to allow programmers to use the HW early on without having to familiarize themselves with all the details.
libcudacxx/include/cuda/try_cancel
Outdated
namespace __detail { | ||
|
||
struct __empty_t { | ||
__device__ void operator()(dim3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: We need to at least define it
__device__ void operator()(dim3); | |
_CCCL_DEVICE _CCCL_HIDE_FROM_ABI void operator()(dim3) const noexcept {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted this to trigger an error if a CUDA C++ program uses it.
Would defining it support that?
I think otherwise I can define it, and then issue a static_assert
below.
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
pre-commit.ci autofix |
Description
This PR provides experimental low-level exposure in libcudacxx for PTX
try_cancel
.While experimental, it'd be very beneficial for this exposure to be available in the CTK, hence adding it to libcu++.
We should aim to stabilize it during the CUDA 13.x cycle.
This low-level wrapper delivers reasonable functionality to end-users while hiding the gnarly synchronization and pipelining required and enabling us to deliver bug-fixes and performance improvements in the future. The functionality is exposed in a portable way, and backported to all SM's that CUDA supports, but the focus is on sm_100+ (we can deliver better performance in sm_70+ later).
This PR intentionally only covers
for_each_cancelled_block
.Once that API is done,
for_each_cancelled_cluster
would be an analogous API to add.Future potential extensions worth exploring:
int NStages = 1
later tht enables users to do, e.g., double-buffering.Checklist