Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix generic-k code in top-k with softmax #1993

Merged
merged 3 commits into from
Feb 1, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@

Those assumptions are as:
1. Fusion is over the N dimension.
2. Top-K is either 2 or 4 elements, and the value is static (meaning two kernels have to be
compiled to support both.)
2. Top-K value is static (meaning multiple kernels have to be compiled to support
different values.)
t4c1 marked this conversation as resolved.
Show resolved Hide resolved
3. The GEMM tile shape along N is greater than or equal to problem size
along N.

Expand Down
6 changes: 3 additions & 3 deletions include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,13 +209,14 @@ void add_element_to_desc_sorted_array(cutlass::Array<Element, N>& a, Element b)
// slower generic path with branching, slower, and can cause register spill
CUTLASS_PRAGMA_UNROLL
for (int k = 0; k < N; ++k) {
if (a[k] <= b) {
if (a[k] < b) {
// Shift down
CUTLASS_PRAGMA_UNROLL
for (int l = N - 1; l > k; --l) {
a[l] = a[l-1];
}
a[k] = b;
break;
}
}
}
Expand All @@ -237,7 +238,7 @@ void merge_desc_sorted_arrays(cutlass::Array<Element, N>& a, const cutlass::Arra
int j = 0;
CUTLASS_PRAGMA_UNROLL
for (int k = 0; k < N; ++k) {
if (a[k] <= b[j]) {
if (a[k] < b[j]) {
// Shift down
CUTLASS_PRAGMA_UNROLL
for (int l = N - 1; l > k; --l) {
Expand Down Expand Up @@ -334,7 +335,6 @@ template <
struct Sm90TopKSoftmaxColReduction {
private:
static_assert(is_same_v<ElementCompute, float>, "Fused Top-K + Softmax reduction requires FP32 accumulation.");
t4c1 marked this conversation as resolved.
Show resolved Hide resolved
static_assert(TopK == 2 || TopK == 4, "Fused Top-K + Softmax reduction only supports K=2 and K=4.");
static_assert(Alignment * sizeof_bits_v<ElementOutput> % 128 == 0, "sub-16B alignment not supported yet");

// Reduction tensors
Expand Down