Add GEMM device function #880

AGonzales-amd · 2025-01-13T17:43:50Z

This adds a gemm device function which is callable in other kernels. The function is designed to be called by an entire wavefront and to compute a block of the output matrix. Therefore, problems can be decomposed into chunks that are operated on by individual wavefronts. Currently, it is used to implement an alternative rocsolver_gemm kernel.

Uses mfma instructions
Limited to __gfx90a__, __gfx940__, __gfx941__, and __gfx942__ architectures
Supports both complex data types and transpose matrix operations

EdDAzevedo · 2025-01-16T14:07:54Z

library/src/specialized/roclapack_gemm_device_functions.hpp

+                                   I p,
+                                   T alpha,
+                                   const T *A,
+                                   I inc_A,


Is the inc_A, inc_B, ... etc the same as "stride" like strideA? If not, perhaps some description would be helpful. Just a suggestion.

Added function documentation, thanks for the suggestion

EdDAzevedo

It would be nice if there are tests scripts or updates to rocsolver-bench and/or rocsolver-test to check for correctness (and perhaps performance). Note check for m,n,k to be nb, nb+1, nb-1 corner cases as well.

Will other BLAS operations, say TRMM, or SYR2K, also take advantage of the new GEMM? One may consider a conceptually recursive formulation so that it can use GEMM.

jmachado-amd · 2025-01-16T21:26:30Z

Hi @AGonzales-amd, I second Ed's suggestion: you should update rocsolver-test and -bench clients to support the internal gemm. I can help you with the tests if you want (this would also be a good opportunity to provide a concrete answer to the question you asked in #879).

AGonzales-amd · 2025-01-16T23:08:05Z

Hi @AGonzales-amd, I second Ed's suggestion: you should update rocsolver-test and -bench clients to support the internal gemm. I can help you with the tests if you want (this would also be a good opportunity to provide a concrete answer to the question you asked in #879).

Thanks @jmachado-amd, I could use your help. I did consider updating the client programs but I had trouble exporting the function or making it visible in the clients.

AGonzales-amd · 2025-01-22T22:38:48Z

Hi @jmachado-amd and @EdDAzevedo, the client programs have been updated to support internal gemm. One thing I'm not sure about is the tolerance for error checking.

jmachado-amd · 2025-01-24T17:10:45Z

Hi @AGonzales-amd, as long as the input matrices are "small" to "medium" sized and have positive, relatively small integer entries, the current test tolerance will work just fine! Let me know if you want to generalize the tests or just understand how those bounds work, and I can explain the important bits of the theory to you.

On another topic, I see that there are many gemm tests failing in Windows, you probably want to have a look at that sooner rather than later.

AGonzales-amd added 3 commits January 8, 2025 23:39

add mfma internal gemm kernel and device functions

a4baff0

temp change

64d9825

revert temp change

262fb78

AGonzales-amd marked this pull request as ready for review January 15, 2025 22:20

AGonzales-amd requested review from jzuniga-amd, tfalders, cgmb, qjojo, EdDAzevedo and jmachado-amd as code owners January 15, 2025 22:20

EdDAzevedo reviewed Jan 16, 2025

View reviewed changes

EdDAzevedo approved these changes Jan 16, 2025

View reviewed changes

AGonzales-amd added the noOptimizations Disable optimized kernels for small sizes for some routines label Jan 16, 2025

documentation and formatting

24e7364

AGonzales-amd added 3 commits January 17, 2025 16:54

fixes

eefef23

rocsolver_gemm test

e937d3f

Merge branch 'develop' into mfma-gemm-internal

de84520

jmachado-amd added the ci:no-ccache Disable ccache label Jan 23, 2025

windows ci

288ac4d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GEMM device function #880

Add GEMM device function #880

AGonzales-amd commented Jan 13, 2025 •

edited

Loading

EdDAzevedo Jan 16, 2025

AGonzales-amd Jan 16, 2025

EdDAzevedo left a comment

jmachado-amd commented Jan 16, 2025

AGonzales-amd commented Jan 16, 2025

AGonzales-amd commented Jan 22, 2025

jmachado-amd commented Jan 24, 2025

Add GEMM device function #880

Are you sure you want to change the base?

Add GEMM device function #880

Conversation

AGonzales-amd commented Jan 13, 2025 • edited Loading

EdDAzevedo Jan 16, 2025

Choose a reason for hiding this comment

AGonzales-amd Jan 16, 2025

Choose a reason for hiding this comment

EdDAzevedo left a comment

Choose a reason for hiding this comment

jmachado-amd commented Jan 16, 2025

AGonzales-amd commented Jan 16, 2025

AGonzales-amd commented Jan 22, 2025

jmachado-amd commented Jan 24, 2025

AGonzales-amd commented Jan 13, 2025 •

edited

Loading