Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment on that all nonlinear operators can be shifted to augmented primal #587

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/src/design/changing_the_primal.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,15 @@ We don't have this in ChainRules.jl yet, because Julia is missing some definitio
We have been promised them for Julia v1.7 though.
You can see what the code would look like in [PR #302](https://github.com/JuliaDiff/ChainRules.jl/pull/302).

## What things can be pulled out of the pullback?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## What things can be pulled out of the pullback?
## What things can be taken out of the pullback?

Seems clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the motivation is that we can reuse the work that was done in the primal to reduce the work that needs to be done in the pullback.

But this current paragraph insinuates (at least to me) that: if there is an operation you can do in the augmented primal, do it there, rather than in the pullback. Is this true? I can imagine this is true if pullback gets called more than once, but that does not happen, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pullback gets called several times by jacobian.

It also sometimes does not get called at all, which should happen when the gradient is Zero (but doesn't always?) and may also happen because AD has called rrule on code it ought to know cannot have a derivative (as in FluxML/NNlib.jl#434 ).

At this point you might wonder, is there a rule for what can be taken out of the pullback and computed in the augmented primal?
We can deduce one, or infact two.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
The first and most practical one is that any computation that depends only on the primal input (or consequently its output) can be shifted to the augmented primal.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
The second and perhaps more insightful is that all nonlinear parts can moved out (this is a weaker statement but more interesting).
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
We know this because pullbacks are linear operators -- they are linear in relation to the tangent they are pulling back.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
This means they are in turn composed only of functions that linear operators (in relation to the tangent).
oxinabox marked this conversation as resolved.
Show resolved Hide resolved
The fully minimized pullback function only calls linear operators -- the apparently nonlinear parts can all be shifted to the augmented primal.
oxinabox marked this conversation as resolved.
Show resolved Hide resolved

## Conclusion
This document has explained why [`rrule`](@ref) is the way it is.
In particular it has highlighted why the primal computation is able to be changed from simply calling the function.
Expand Down