Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

38 improve and re enable func lpf put parallel bad pattern test #39

Open
wants to merge 98 commits into
base: master
Choose a base branch
from

Conversation

anyzelman
Copy link
Member

Closes #38

Kiril Dichev added 30 commits August 20, 2024 11:39
…quests in a queue pair), not just relying on device information about max_qp_wr, but actually trying to create QPs via ibv_create_qp with different max_send_wr until we find the largest still working number (via binary search). This becomes the updated m_maxSrs. Independently, the 100K element test manyPuts needs to be downgraded to 5K for our cluster, as our count is just over 10K, but actually 10K does not work as well (not sure why?)
…e in manyPuts -- the device does not support having too many messages in the send WR QP
…it is very complicated to fix these tests - they seem all over the place, not working, but commiting it
… (only the minimum) but still have many failing tests without explanation, and not tested at all properly
…in the execute command. Also, reduce some example message count as it does not work with IB Verbs with very large tests on the ARM machine
…ent logic if the bloody Gtest wants to just list the tests or run them
…. Also, using Setup and TearDown for entire test suite now, pretty neat, no code duplication each time
…d remove c99 requirement and turn them into C++ tests
… I need to make MPI engines in debug layer call MPI_Abort, and pthread engine in debug layer call std::abort
…y, while it works, this didn't solve the problem. Mpirun still is used with pthreads, so it changes the std::abort signal to 134. This is why now I changed the launcher. Still having issues with some hybrid tests though.
…ap script from pre-existing googletest messages
… which internally is non-portably converted to 134. This also simplifies the launcher script. Also fix some incorrect delete's for arrays in the collectives
…y, we get all tests at the moment via gtest_add_tests. It would be good to replace gtest_add_tests with gtest_discover_tests in the future though, because the current one takes 60-90 seconds to configure. Also, there is a horrible bug now where if I specify a high CMake version (e.g. the needed 3.29.0), the GoogleTests would simply not compile at all
anyzelman and others added 29 commits November 2, 2024 14:27
…n test. Document how the test suite works with regards to the number of LPF processes they run with
…he processor count and overwriting min/max if needed. Also, delete run.sh which is not used anymore. Also remove a debug message from MPI's abort
…ing for abort.h in the hybrid engine, and finally fix visibility of LPF_HAS_ABORT for hybrid engine
…s input arguments which are not documented anywhere. Also, variants of this test are in the exception list, which might mean this test is not stable.
…chook tests to ensure they don't block each other using the same port
…cs expose the issue, and that number is capped with current CMake.
…use mpirma and mpimsg as fallback in such cases
…ible (even if no IB card is present). With GTest integration now, this is only possible when tests are disabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve and re-enable func_lpf_put_parallel_bad_pattern test
2 participants