-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sparse Matrix Support #47
Conversation
…rge into dense_sparse_staging
…ero entry test case
…to the shared memory when their values are know at compile time
…if beta is 0, but only allocate shared memory, transpose correctly etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, first of all: thank you a lot for this PR! And sorry for the delay with reviewing. Looking at the code, I think most of it should be good; I've added some minor annotations.
beta: float): | ||
self._reset() | ||
|
||
#if mat_a.get_values() == None or not trans_b: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these comments still necessary—or can they be removed? (same for further down the file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a comment I forgot to clean... I will clean this unnecessary comment and check the changes again.
} | ||
}; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at the end of the file
def _estimate_num_registers_per_mult(self, accumulator_length): | ||
# Note: derived experimentally | ||
factor = self._vm.bytes_per_real() / 4 | ||
return factor * (32 + accumulator_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 32 here Nvidia warp-size? (if so, it would be better to use the vm HW description parameter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code is actually copied from the dense thread policies' _estimate_num_registers_per_mult (I know not very clean, but in general for both dense sparse and dense dense thread policies it would be better to choose accumulator length instead of magic number 32, I will update)
def _estimate_num_registers_per_mult(self, accumulator_length): | ||
# Note: derived experimentally | ||
factor = self._vm.bytes_per_real() / 4 | ||
return factor * (32 + accumulator_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in the other file—32==warp size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same for the dense thread policy too, I will update all
num_elements: 100 | ||
|
||
kernel_type: "kernel_type_params" | ||
kernel_type_params: [ "shr_mem" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at the end of the file
num_elements: 100 | ||
|
||
kernel_type: "kernel_type_params" | ||
kernel_type_params: [ "shr_mem" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at the end of the file
General comment, I will add newlines at the end of the files. I just used autopep's default settings for formatting almost all of the times. |
This MR implements dense-by-sparse and sparse-by-dense matrix multiplication support for Gemmforge. I have implemented them as a part of the master thesis " Improved GPU Kernel Generation for SeisSol using Loop-over-GEMM and Sparse-Matrix Operations ".
I have included tests for Sparse-by-Dense and Dense-by-Sparse matrix multiplication, integrated into the CI/CD pipeline for automated testing. I decided to remove the support for transposed sparse matrices (the order of the coordinate list is the storage order of the matrix) and removed support for register-only backends as I could not test them thoroughly for performance.