-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comment on that all nonlinear operators can be shifted to augmented primal #587
base: main
Are you sure you want to change the base?
Conversation
@@ -469,6 +469,15 @@ We don't have this in ChainRules.jl yet, because Julia is missing some definitio | |||
We have been promised them for Julia v1.7 though. | |||
You can see what the code would look like in [PR #302](https://github.com/JuliaDiff/ChainRules.jl/pull/302). | |||
|
|||
## What things can be pulled out of the pullback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## What things can be pulled out of the pullback? | |
## What things can be taken out of the pullback? |
Seems clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the motivation is that we can reuse the work that was done in the primal to reduce the work that needs to be done in the pullback.
But this current paragraph insinuates (at least to me) that: if there is an operation you can do in the augmented primal, do it there, rather than in the pullback.
Is this true? I can imagine this is true if pullback gets called more than once, but that does not happen, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pullback gets called several times by jacobian
.
It also sometimes does not get called at all, which should happen when the gradient is Zero (but doesn't always?) and may also happen because AD has called rrule
on code it ought to know cannot have a derivative (as in FluxML/NNlib.jl#434 ).
This is (not surprisingly) reminiscent of the discussion on linearity in https://arxiv.org/abs/2204.10923. I wonder if any of the visual aids in that paper would be helpful here? |
Co-authored-by: Miha Zgubic <[email protected]>
Co-authored-by: Miha Zgubic <[email protected]>
Co-authored-by: Miha Zgubic <[email protected]>
Indeed not surprising, given it emerged in part from discussion with several of the authors. |
Co-authored-by: Mathieu Besançon <[email protected]>
Codecov ReportBase: 93.11% // Head: 93.17% // Increases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #587 +/- ##
==========================================
+ Coverage 93.11% 93.17% +0.05%
==========================================
Files 15 15
Lines 901 908 +7
==========================================
+ Hits 839 846 +7
Misses 62 62
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This is a bit rambly but i felit it was worth writing down