You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Gamma matrix multiplication is implemented exploiting that it has only one non-zero element per row and column. This means that small blocks (or even just vectors) have to be multiplied with each other, yielding high overheads for Eigen. Perhaps it would make sense to use Eigen's sparse capabilities instead.
Eventually we will have a micro-benchmark tool, so we might be able to test exactly that!
The text was updated successfully, but these errors were encountered:
I think the block-wise gamma is okay. Are the eigenvectors belonging to a particular dilution block reshaped into a consecutive block, or are the blocks in the interlace scheme really interleaved "block" of a single eigenvector? If they are not reshaped, then this is the major performance improvement that can be extracted at this stage. Dilution blocks should definitely be grouped into consecutive elements. This is automatic with block dilution, but of course has to be done explicitly with interlace dilution.
Looking at the profile (text and graph as PDF) generated with gprof we find that most of the time is spend in matrix multiplication with Eigen. Since we do contractions, this is pretty much what we want. However, we feel that it might not be efficient.
The analytic computation of the needed FLOPs is not done yet, but should give some insight.
We can also try to add a manual transposition into the matrix product as this will allow the use of other BLAS routines per their documentation.
And also we can try to run this in VTune on JURECA to get some more profiling, especially with OpenMP.
Also Eigen might have some performance output capabilities, this can be looked into as well.
The Gamma matrix multiplication is implemented exploiting that it has only one non-zero element per row and column. This means that small blocks (or even just vectors) have to be multiplied with each other, yielding high overheads for Eigen. Perhaps it would make sense to use Eigen's sparse capabilities instead.
Eventually we will have a micro-benchmark tool, so we might be able to test exactly that!
The text was updated successfully, but these errors were encountered: