-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-hipGraph MSCCL++ tests for allReduce and allGather #1503
Conversation
…neUtils so that it can be included in several tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give the PR a more descriptive title, as this will become the commit message.
…teNCCLid only from parent process
…tandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isaki001 I'd suggest renaming call_rccl
but other than that this is good to go. The failed CI is not related to your changes.
Co-authored-by: corey-derochie-amd <[email protected]>
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
Added functional test for allGather and allReduce when utilizing MSCCL++ kernels in non-hipGraph mode, with/without managed memory.
Why were the changes made?
No test for non-hipGraph mode user-buffer registration.
How was the outcome achieved?
TestBed infrastructure was encountering a hang. As such, I added a simple routine that creates 8 process through fork(), and calls allReduce/allGather.
Additional Documentation:
What else should the reviewer know?
Approval Checklist
Do not approve until these items are satisfied.