Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sparse Matrix Support #47

Merged
merged 58 commits into from
Jan 19, 2024
Merged

Add Sparse Matrix Support #47

merged 58 commits into from
Jan 19, 2024

Conversation

ThrudPrimrose
Copy link
Collaborator

This MR implements dense-by-sparse and sparse-by-dense matrix multiplication support for Gemmforge. I have implemented them as a part of the master thesis " Improved GPU Kernel Generation for SeisSol using Loop-over-GEMM and Sparse-Matrix Operations ".

I have included tests for Sparse-by-Dense and Dense-by-Sparse matrix multiplication, integrated into the CI/CD pipeline for automated testing. I decided to remove the support for transposed sparse matrices (the order of the coordinate list is the storage order of the matrix) and removed support for register-only backends as I could not test them thoroughly for performance.

…to the shared memory when their values are know at compile time
@ThrudPrimrose ThrudPrimrose marked this pull request as ready for review November 29, 2023 00:53
Copy link
Contributor

@davschneller davschneller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, first of all: thank you a lot for this PR! And sorry for the delay with reviewing. Looking at the code, I think most of it should be good; I've added some minor annotations.

beta: float):
self._reset()

#if mat_a.get_values() == None or not trans_b:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comments still necessary—or can they be removed? (same for further down the file)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a comment I forgot to clean... I will clean this unnecessary comment and check the changes again.

}
};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at the end of the file

def _estimate_num_registers_per_mult(self, accumulator_length):
# Note: derived experimentally
factor = self._vm.bytes_per_real() / 4
return factor * (32 + accumulator_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 32 here Nvidia warp-size? (if so, it would be better to use the vm HW description parameter)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code is actually copied from the dense thread policies' _estimate_num_registers_per_mult (I know not very clean, but in general for both dense sparse and dense dense thread policies it would be better to choose accumulator length instead of magic number 32, I will update)

def _estimate_num_registers_per_mult(self, accumulator_length):
# Note: derived experimentally
factor = self._vm.bytes_per_real() / 4
return factor * (32 + accumulator_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the other file—32==warp size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same for the dense thread policy too, I will update all

num_elements: 100

kernel_type: "kernel_type_params"
kernel_type_params: [ "shr_mem" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at the end of the file

num_elements: 100

kernel_type: "kernel_type_params"
kernel_type_params: [ "shr_mem" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at the end of the file

@ThrudPrimrose
Copy link
Collaborator Author

General comment, I will add newlines at the end of the files. I just used autopep's default settings for formatting almost all of the times.

@davschneller davschneller merged commit 19fcefd into master Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants