You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a matrices A (m,n), B (n,c) and C (n,d). I need to do the outer product of B and C and get R (n, c * d). I then need to do A @ R to get (m, c * d). I'd like to do this in one fused kernel and avoid writing and reading R to global memory. What would be the best way to implement this? I think I looked through every example and I didn't see anything quite like this problem.
The text was updated successfully, but these errors were encountered:
(Edit: i realized an outer product is just a mma with the inner dimension 1)
So full result is A @ (B @ C).T
A is [n, m, 1],
R is [c, d, n] but is flattened/transposed to [n, c * d, 1]
B is [1, c, n]
C is [1, d, n]
The final result is A @ R which would be [c * d, m, 1]
So you can read B and C in as separate lines in shared memory and just fill in that giant square in shared memory on demand to do matrix multiply against A which would be loaded normally.
I did get a bit in to the weeds with cutlass examples and I'm thinking maybe before a gemm call in cute i could write that outer product to shared memory to have the gemm call pull it back down in proper order to the registers? (i'm looking at sm_80).
Or maybe it should be two consecutive mmas, forming R first?
There is no R in this? I am not sure I understand the full computation still because you also say "The final result is A @ R which would be [c * d, m, 1]"
I have a matrices A (m,n), B (n,c) and C (n,d). I need to do the outer product of B and C and get R (n, c * d). I then need to do A @ R to get (m, c * d). I'd like to do this in one fused kernel and avoid writing and reading R to global memory. What would be the best way to implement this? I think I looked through every example and I didn't see anything quite like this problem.
The text was updated successfully, but these errors were encountered: