Add Sparse Matrix Support #47

ThrudPrimrose · 2023-11-29T00:29:02Z

This MR implements dense-by-sparse and sparse-by-dense matrix multiplication support for Gemmforge. I have implemented them as a part of the master thesis " Improved GPU Kernel Generation for SeisSol using Loop-over-GEMM and Sparse-Matrix Operations ".

I have included tests for Sparse-by-Dense and Dense-by-Sparse matrix multiplication, integrated into the CI/CD pipeline for automated testing. I decided to remove the support for transposed sparse matrices (the order of the coordinate list is the storage order of the matrix) and removed support for register-only backends as I could not test them thoroughly for performance.

…rge into dense_sparse_staging

…kernels

…ero entry test case

…to the shared memory when their values are know at compile time

…if beta is 0, but only allocate shared memory, transpose correctly etc)

davschneller

Hi, first of all: thank you a lot for this PR! And sorry for the delay with reviewing. Looking at the code, I think most of it should be good; I've added some minor annotations.

davschneller · 2023-12-18T16:54:59Z

gemmforge/instructions/builders/gemms/gemm_builder.py

+            beta: float):
+    self._reset()
+
+    #if mat_a.get_values() == None or not trans_b:


Are these comments still necessary—or can they be removed? (same for further down the file)

It is a comment I forgot to clean... I will clean this unnecessary comment and check the changes again.

davschneller · 2023-12-22T15:39:36Z

common/test_drivers/simple_dense_sparse_driver.h

+      }
+    };
+  }
+}


Missing newline at the end of the file

gemmforge/thread_policies/gemm/dense_sparse.py

davschneller · 2023-12-22T15:44:19Z

gemmforge/thread_policies/gemm/dense_sparse.py

+  def _estimate_num_registers_per_mult(self, accumulator_length):
+    # Note: derived experimentally
+    factor = self._vm.bytes_per_real() / 4
+    return factor * (32 + accumulator_length)


Is the 32 here Nvidia warp-size? (if so, it would be better to use the vm HW description parameter)

This part of the code is actually copied from the dense thread policies' _estimate_num_registers_per_mult (I know not very clean, but in general for both dense sparse and dense dense thread policies it would be better to choose accumulator length instead of magic number 32, I will update)

davschneller · 2023-12-22T15:45:22Z

gemmforge/thread_policies/gemm/sparse_dense.py

+  def _estimate_num_registers_per_mult(self, accumulator_length):
+    # Note: derived experimentally
+    factor = self._vm.bytes_per_real() / 4
+    return factor * (32 + accumulator_length)


Same as in the other file—32==warp size?

Yes, same for the dense thread policy too, I will update all

davschneller · 2023-12-22T15:46:03Z

tests/sparse_dense_gemms/testsuites/sparse_dense_b_trans.yaml

+    num_elements: 100
+
+    kernel_type: "kernel_type_params"
+    kernel_type_params: [ "shr_mem" ]


Missing newline at the end of the file

davschneller · 2023-12-22T15:46:06Z

tests/sparse_dense_gemms/testsuites/sparse_dense.yaml

+    num_elements: 100
+
+    kernel_type: "kernel_type_params"
+    kernel_type_params: [ "shr_mem" ]


Missing newline at the end of the file

ThrudPrimrose · 2023-12-29T16:20:31Z

General comment, I will add newlines at the end of the files. I just used autopep's default settings for formatting almost all of the times.

ThrudPrimrose added 30 commits April 13, 2023 13:28

Implement Dense x Sparse Matrix Multiplication

83dec37

And generators temporarily to root

72ca8f9

And generators temporarily to root

ea9d078

Remove rogue files that were commited

5a1ffe6

Implement complete Dense x Sparse test cases

bd56b06

Update benchmark code, use typing functionality up to python 3.8

e023fcc

Update benchmark code, use functionality up to python 3.8

f6f2ff7

Merge branch 'dense_sparse_staging' of github.com:ravil-mobile/gemmfo…

1b6c706

…rge into dense_sparse_staging

Clean up code

824e52e

Merge branch 'master' into dense_sparse_staging

f18a547

Incorporate dense_sparse gemms to kernel builder structure

11fa014

Remplement Dense x Sparse in accordance to the new factory

53c52bf

Fix floating pointa llocation and prune parameters for dense x dense …

1ab30a2

…kernels

Update tests for kernel types, fix a bug in testing, add random non-z…

be3f686

…ero entry test case

Improve code quality and structure

99d968c

Implement Dense x Sparse Register Only kernel

1af77b9

Update tests to include both kernel backend strategies

91bcea9

Minor code clean-up

16a5791

Prepare code skeleton for Sparse x Dense matrix multiplication kernels

1ee94c2

Fix matrix type check error

917f4ba

Fix erronous import

3c8030e

Change test folder names, add dense sparse tests to CI-runner

f81fe6e

Remove test folder dense_sparse_gemm (rename to dense_sparse_gemms)

c66dca2

Update naming and clean the cmake of dense-sparse tests

0acbd02

Fix a big in test code generation for dense x sparse tests

58d10db

Merge branch 'dense_sparse_staging' into sparse_dense

801eded

Update parameters for Sparse x Dense and update checks

690db61

Fix an error in Sparsity Check of Matrices

ab59501

Implement Sparse X Dense Matrix Multiplication, Do not load matrices …

c80ddb6

…to the shared memory when their values are know at compile time

Add sparse-dense gemm tests

c0c9ccc

ThrudPrimrose added 21 commits July 26, 2023 11:23

Clean up unused function in sparse-dense gemm

b3e33af

Do not access B if values are known in dense x sparse kernels

98a58dd

(Re)Implement sparse x dense kernels, optimziation is needed

ccab879

Fix minor bug in thread policy

8179e38

Small fix the dense-sparse thread policy to limit shared memory usage

e8caabe

Improve and fix sparse-dense matrix kernel generation (Do not load C …

d9877a3

…if beta is 0, but only allocate shared memory, transpose correctly etc)

Minor clean-up

3e25aa1

Fix a bug in sparse-dense matrix kernel

45f8465

Fix bug in dense-sparse mat mul

4ed4c01

Rm accidental orint statements

fa0e6aa

Improve thread policies

6adee2e

Minor shared memory perf improvement to sparse dense kernels

97817d0

Update and fix dense-sparse tests

0d46b12

Try new matrix loading strategy

6493e0e

Try another strategy (final)

86b1324

Improve if clauses in sparse dense gemms

7be617b

New loading strategy for matrix A in sparse dense

6ab8678

Try one more option

53ceee4

New sparse dense impl

bc21478

clean up sparse dense mat mul

40a07b0

Update sparse GEMM tests

9705f67

ThrudPrimrose requested a review from davschneller November 29, 2023 00:29

ThrudPrimrose assigned davschneller and ThrudPrimrose Nov 29, 2023

ThrudPrimrose marked this pull request as ready for review November 29, 2023 00:53

davschneller reviewed Dec 22, 2023

View reviewed changes

Fix PR comments, improve readability of the code

452b04c

davschneller approved these changes Jan 19, 2024

View reviewed changes

davschneller merged commit 19fcefd into master Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sparse Matrix Support #47

Add Sparse Matrix Support #47

ThrudPrimrose commented Nov 29, 2023

davschneller left a comment

davschneller Dec 18, 2023

ThrudPrimrose Dec 29, 2023

davschneller Dec 22, 2023

davschneller Dec 22, 2023

ThrudPrimrose Dec 29, 2023

davschneller Dec 22, 2023

ThrudPrimrose Dec 29, 2023

davschneller Dec 22, 2023

davschneller Dec 22, 2023

ThrudPrimrose commented Dec 29, 2023

Add Sparse Matrix Support #47

Add Sparse Matrix Support #47

Conversation

ThrudPrimrose commented Nov 29, 2023

davschneller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThrudPrimrose commented Dec 29, 2023