[QST] Fused mma and outer product #2062

capybara-club · 2025-01-25T19:18:54Z

I have a matrices A (m,n), B (n,c) and C (n,d). I need to do the outer product of B and C and get R (n, c * d). I then need to do A @ R to get (m, c * d). I'd like to do this in one fused kernel and avoid writing and reading R to global memory. What would be the best way to implement this? I think I looked through every example and I didn't see anything quite like this problem.

thakkarV · 2025-01-26T02:59:44Z

I am not sure I follow what

outer product of B and C and get R (n, c * d)

means. Outer product of two matrices as in this?

Can you write out the dimension of your input and output tensors in a [inner, outer, batch] format for each and the series of computations?

Overall, what you seem to want is quite similar to a flash attention implementation (linear attention in this case to be exact)

capybara-club · 2025-01-26T03:12:53Z

Thank you! Yes!

(Edit: i realized an outer product is just a mma with the inner dimension 1)

So full result is A @ (B @ C).T

A is [n, m, 1],
R is [c, d, n] but is flattened/transposed to [n, c * d, 1]
B is [1, c, n]
C is [1, d, n]
The final result is A @ R which would be [c * d, m, 1]

So you can read B and C in as separate lines in shared memory and just fill in that giant square in shared memory on demand to do matrix multiply against A which would be loaded normally.

I did get a bit in to the weeds with cutlass examples and I'm thinking maybe before a gemm call in cute i could write that outer product to shared memory to have the gemm call pull it back down in proper order to the registers? (i'm looking at sm_80).

Or maybe it should be two consecutive mmas, forming R first?

thakkarV · 2025-01-30T06:45:08Z

So full result is A @ (B @ C).T

There is no R in this? I am not sure I understand the full computation still because you also say "The final result is A @ R which would be [c * d, m, 1]"

capybara-club added ? - Needs Triage question Question labels Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Fused mma and outer product #2062

[QST] Fused mma and outer product #2062

capybara-club commented Jan 25, 2025

thakkarV commented Jan 26, 2025

capybara-club commented Jan 26, 2025 •

edited

Loading

thakkarV commented Jan 30, 2025

[QST] Fused mma and outer product #2062

[QST] Fused mma and outer product #2062

Comments

capybara-club commented Jan 25, 2025

thakkarV commented Jan 26, 2025

capybara-club commented Jan 26, 2025 • edited Loading

thakkarV commented Jan 30, 2025

capybara-club commented Jan 26, 2025 •

edited

Loading