-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: use F.linear for MLP #4513
base: devel
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe changes involve refactoring the Changes
Sequence DiagramsequenceDiagram
participant MLPLayer
participant F.linear
participant Activation
MLPLayer->>F.linear: Perform linear transformation
F.linear-->>MLPLayer: Transformed tensor
MLPLayer->>Activation: Apply activation function
Activation-->>MLPLayer: Activated output
alt Identity Transformation
MLPLayer->>MLPLayer: Add input tensor
end
alt Concatenation
MLPLayer->>MLPLayer: Concatenate tensors
end
The sequence diagram illustrates the simplified flow of the 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pt/utils/utils.py (1)
48-48
: One-line activation application
Directly returningself.activation(x)
simplifies the code for applying the activation. Consider adding docstrings or references in the code to remind contributors of the available activation keys inget_activation_fn
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/model/network/mlp.py
(2 hunks)deepmd/pt/utils/utils.py
(1 hunks)
🔇 Additional comments (5)
deepmd/pt/model/network/mlp.py (3)
11-11
: Use of torch.nn.functional
for linear transformations
Importing torch.nn.functional
helps consolidate linear operations and other function calls (e.g., non-linear activation) in a functional style. This aligns well with the change to F.linear
, improving code clarity and performance.
206-207
: Efficient linear transformation & activation chaining
Replacing manual matrix multiplication with F.linear(xx, self.matrix.t(), self.bias)
is more concise and may marginally improve performance. Cloning the result after applying the activation is a good safeguard against in-place modifications.
209-210
: Configurable skip connection with idt
Multiplying by self.idt
here offers a flexible skip or gating mechanism. However, if self.idt
is large or uninitialized, it could introduce training instability. Consider adding constraints or initialization strategies to keep it within a valid scale.
deepmd/pt/utils/utils.py (2)
24-25
: Dynamic activation assignment
The constructor’s assignment to self.activation
via get_activation_fn(activation)
centralizes activation function logic, reducing conditional branches in the forward pass. This design is both clean and extensible.
26-43
: Centralized activation function selection
Using a standardized method for retrieving activation functions streamlines updates and new additions. The RuntimeError
for unsupported activations provides clear feedback to developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why introducing the change?
do you observe explicit improvement in the efficiency of the linear transformations?
I found this possibility when investigating on the matmuls not using Tensor Cores. This op fuses matmul and add when bias is present.
No, since currently no models are using MLP layer with bias. Maybe we could just keep this change for future use? |
I remember that torchscript could do the optimization automatically. |
Yes, but we are not using torchscript in the training loop by now. I'll test torchscript later. |
@njzjz Setting |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## devel #4513 +/- ##
==========================================
- Coverage 84.59% 84.55% -0.04%
==========================================
Files 675 677 +2
Lines 63575 63903 +328
Branches 3486 3486
==========================================
+ Hits 53779 54035 +256
- Misses 8671 8742 +71
- Partials 1125 1126 +1 ☔ View full report in Codecov by Sentry. |
It brings <1% speedup.
Summary by CodeRabbit