-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: allow tf32 datatype for matmul #4499
base: devel
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe pull request introduces a configuration change in the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4499 +/- ##
=======================================
Coverage 84.41% 84.42%
=======================================
Files 670 670
Lines 62147 62141 -6
Branches 3487 3488 +1
=======================================
+ Hits 52464 52465 +1
+ Misses 8556 8549 -7
Partials 1127 1127 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to provide an option in the input script to control the behavior of using tensorcore.
@njzjz Do you have any suggestions on adding this option? Maybe as an option under "training", or somehow integrating with precision controlling flags? |
it seems that the precision during the training and inference time should be consistent. |
I think we can make the decision after we see the benchmark result. |
I tested training the DPA-3 alpha model from scratch on AlMgCu dataset with different matmul precision on A800 GPU, and here is the result:
1 GPU training speed:
Using TF32 improves ~15% training speed.
I used FP32 and TF32 for |
I'll further test the impact of TF32 using OMat dataset. |
Pytorch by default disables using TF32 data type starting from A100 GPU. Enabling TF32 utilizes tensor core on A100 GPUs, expecting better performance.
I will later attach some test results on the speed-up and accuracy of this PR.
Ref:
https://pytorch.org/docs/stable/notes/cuda.html#tf32-on-ampere
https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html
Summary by CodeRabbit