Replies: 1 comment 1 reply
-
The A and B layouts have projections in the threads which are difficult to depict in these diagrams. T64 is "missing" from the A Layout. T64 will read the same values that T0 reads in A. T32 is "missing" from the B Layout. T32 will read the same values that T0 reads in B. Your understanding is correct -- all threads hold parts of the data of matrices A, B, and C, but that data may actually be reproduced across multiple threads. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have read all the documents of CuTe, and I have always been puzzled about the TileMMA thread layout setting ThrLayoutVMNK (_32,_2,_2,_1):(_1,_32,_64,_0). When I use print_latex to print, I see that the data of matrix A is distributed among threads 0-31 and 32-63. Does this mean that the two warps of thread idx 64~127 do not hold any data of matrix A? Similarly, matrix B is also distributed among the threads of 2 warps (0-31, 64-95), but the data of matrix C is distributed within the full 4 warps (0-127). My current understanding is that all threads hold parts of the data of matrices A, B, and C, it's just that print_latex cannot print them out. I would be very grateful if someone could answer this!
And the output latex as follow:
![image](https://private-user-images.githubusercontent.com/66902343/308808492-7c3c16ee-af3a-420b-822c-0b238c4b754d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MDUyMTAsIm5iZiI6MTczODkwNDkxMCwicGF0aCI6Ii82NjkwMjM0My8zMDg4MDg0OTItN2MzYzE2ZWUtYWYzYS00MjBiLTgyMmMtMGIyMzhjNGI3NTRkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA1MDgzMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE4ZGZmN2E4NTE5ZjhhNjE3YzlhZWIxODU0ZmI1ZjhjZjJjNjU5OGUyMTU4MWM4MmUzYzRiNzFjZWQ0YTFlZGEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Ch0A8BskDi6IV4RXovFmXEU-dRPYAXHxwQSdTugvrbA)
Beta Was this translation helpful? Give feedback.
All reactions