Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual block-wise Gamma multiplication might be a problem #63

Open
martin-ueding opened this issue Apr 24, 2018 · 2 comments
Open

Manual block-wise Gamma multiplication might be a problem #63

martin-ueding opened this issue Apr 24, 2018 · 2 comments

Comments

@martin-ueding
Copy link
Contributor

The Gamma matrix multiplication is implemented exploiting that it has only one non-zero element per row and column. This means that small blocks (or even just vectors) have to be multiplied with each other, yielding high overheads for Eigen. Perhaps it would make sense to use Eigen's sparse capabilities instead.

Eventually we will have a micro-benchmark tool, so we might be able to test exactly that!

@kostrzewa
Copy link
Member

I think the block-wise gamma is okay. Are the eigenvectors belonging to a particular dilution block reshaped into a consecutive block, or are the blocks in the interlace scheme really interleaved "block" of a single eigenvector? If they are not reshaped, then this is the major performance improvement that can be extracted at this stage. Dilution blocks should definitely be grouped into consecutive elements. This is automatic with block dilution, but of course has to be done explicitly with interlace dilution.

@martin-ueding
Copy link
Contributor Author

Looking at the profile (text and graph as PDF) generated with gprof we find that most of the time is spend in matrix multiplication with Eigen. Since we do contractions, this is pretty much what we want. However, we feel that it might not be efficient.

The analytic computation of the needed FLOPs is not done yet, but should give some insight.

We can also try to add a manual transposition into the matrix product as this will allow the use of other BLAS routines per their documentation.

And also we can try to run this in VTune on JURECA to get some more profiling, especially with OpenMP.

Also Eigen might have some performance output capabilities, this can be looked into as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants