Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Docs on ProjectTo and embedded subspaces #412

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ makedocs(
"Introduction" => "index.md",
"FAQ" => "FAQ.md",
"Rule configurations and calling back into AD" => "config.md",
"Types that represent embedded subspaces" => "embedded_subspaces.md",
"Opting out of rules" => "opting_out_of_rules.md",
"Writing Good Rules" => "writing_good_rules.md",
"Complex Numbers" => "complex.md",
Expand Down
116 changes: 116 additions & 0 deletions docs/src/embedded_subpaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Types that represent embedded subspaces
_Taking Types Representing Embedded Subspaces Seriously™_

To paraphrase Stefan Karpinski: _"This does mean treating sparse matrixes not just as a representation of dense matrixes, but the alternative is just too horrible."_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is phrased strangely and hard to follow.

Suggested change
To paraphrase Stefan Karpinski: _"This does mean treating sparse matrixes not just as a representation of dense matrixes, but the alternative is just too horrible."_
To paraphrase Stefan Karpinski: _"This does mean treating sparse matrixes not just as a representation of dense matrixes, but the alternative is just too horrible."_


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


Consider the following possible `rrule` for `*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider the following possible `rrule` for `*`
Consider the following possible `rrule` for `*`:

```julia
function rrule(::typeof(*), a, b)
mul_pullback(ȳ) = (NoTangent(), ȳ*b', a'*ȳ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you use the Δ convention here?

return (a * b), mul_pullback
end
```

This seems perfectly reasonable for floats:
```julia
julia> _, pb = rrule(*, 2.0, 3.0)
(6.0, var"#mul_pullback#10"{Float64, Float64}(2.0, 3.0))

julia> pb(1.0)
(NoTangent(), 3.0, 2.0)
```
and for matrixes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and for matrixes
and for matrices:

```julia
julia> _, pb = rrule(*, [1.0 2.0; 3.0 4.0], [10.0 20.0; 30.0 40.0])
([70.0 100.0; 150.0 220.0], var"#mul_pullback#10"{Matrix{Float64}, Matrix{Float64}}([1.0 2.0; 3.0 4.0], [10.0 20.0; 30.0 40.0]))

julia> pb([1.0 0.0; 0.0 1.0])
(NoTangent(), [10.0 30.0; 20.0 40.0], [1.0 3.0; 2.0 4.0])
```

and even for complex numbers (assuming you like conjugation [which we do](@ref complexfunctions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and even for complex numbers (assuming you like conjugation [which we do](@ref complexfunctions))
and even for complex numbers (assuming you like conjugation [which we do](@ref complexfunctions)):

```julia
julia> _, pb = rrule(*, 0.0 + im, 1.0 + im)
(-1.0 + 1.0im, var"#mul_pullback#10"{ComplexF64, ComplexF64}(0.0 + 1.0im, 1.0 + 1.0im))

julia> pb(1.0)
(NoTangent(), 1.0 - 1.0im, 0.0 - 1.0im)

julia> pb(1.0im)
(NoTangent(), 1.0 + 1.0im, 1.0 + 0.0im)
```

So far everything is wonderful.
Isn't linear algebra great? We get this nice code that generalizes to all kinds of vector spaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One sentence per line.

Suggested change
Isn't linear algebra great? We get this nice code that generalizes to all kinds of vector spaces.
Isn't linear algebra great?
We get this nice code that generalizes to all kinds of vector spaces.


What if we start mixing it up a bit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
What if we start mixing it up a bit.
What if we start mixing it up a bit?

Let's try a real amd a complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Let's try a real amd a complex
Let's try a real and a complex:

```julia
julia> _, pb = rrule(*, 2.0, 3.0im)
(0.0 + 6.0im, var"#mul_pullback#10"{Float64, ComplexF64}(2.0, 0.0 + 3.0im))

julia> pb(1.0)
(NoTangent(), 0.0 - 3.0im, 2.0)

julia> pb(1.0im)
(NoTangent(), 3.0 + 0.0im, 0.0 + 2.0im)
```

That's _an_ answer.
It's consistent with treating the reals as being an embedded subspace of the complex numbers.
i.e. treating `2.0` as actually being `2.0 + 0im`.
It doesn't feel great that we have escaped from the real type in this way.
But it's not wrong as such.

Over to matrixes, lets try as `Diagonal` with a `Matrix`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Over to matrixes, lets try as `Diagonal` with a `Matrix`
Over to matrices, let's try a `Diagonal` with a `Matrix`:

```julia
julia> _, pb = rrule(*, Diagonal([1.0, 2.0]), [1.0 2.0; 3.0 4.0])
([1.0 2.0; 6.0 8.0], var"#mul_pullback#10"{Diagonal{Float64, Vector{Float64}}, Matrix{Float64}}([1.0 0.0; 0.0 2.0], [1.0 2.0; 3.0 4.0]))

julia> pb([1.0 0.0; 0.0 1.0])
(NoTangent(), [1.0 3.0; 2.0 4.0], [1.0 0.0; 0.0 2.0])
```

This is also _an_ answer.
This seems even worse though: no only has it escaped the `Diagonal` type, it has even escaped the subspace it represents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This seems even worse though: no only has it escaped the `Diagonal` type, it has even escaped the subspace it represents.
This seems even worse though: not only has it escaped the `Diagonal` type, it has even escaped the subspace it represents (diagonal matrices).


Further, it is inconsistent with what we would get had we AD'd the function for `*(::Diagonal, ::Matrix)` directly.
[That primal function](https://github.com/JuliaLang/julia/blob/f7f46af8ff39a1b4c7000651c680058e9c0639f5/stdlib/LinearAlgebra/src/diagonal.jl#L224-L245) boils down to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this refer to a specific Julia version for simplicity?

```julia
*(a::Diagonal, b::AbstractMatrix) = a.diag .* b
```
By reading that primal method, we know that that ADing that method would have zeros on the off-diagonals, because they are never even accessed
(a similar argument can be made for the complex part of a real number).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(a similar argument can be made for the complex part of a real number).
(a similar argument can be made for the imaginary part of a real number).


---

Consider the following possible `rrule` for `sum`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider the following possible `rrule` for `sum`
Consider the following possible `rrule` for `sum`:

```julia
function rrule(::typeof(sum), x::AbstractArray)
sum_pullback(ȳ) = (NoTangent(), fill(ȳ, size(x)))
return sum(x), sum_pullback
end
```

This seems all well and good at first:
```julia
julia> _, pb = rrule(sum, [1.0 2.0; 3.0 4.0])
(10.0, var"#sum_pullback#9"{Matrix{Float64}}([1.0 2.0; 3.0 4.0]))

julia> pb(1.0)
(NoTangent(), [1.0 1.0; 1.0 1.0])
```

But now consider:
```julia
julia> _, pb = rrule(sum, Diagonal([1.0, 2.0]))
(3.0, var"#sum_pullback#9"{Diagonal{Float64, Vector{Float64}}}([1.0 0.0; 0.0 2.0]))

julia> pb(1.0)
(NoTangent(), [1.0 1.0; 1.0 1.0])
```
That's not right -- not if us saying this was `Diagonal` meant anything.
If you try and use that dense matrix, to do gradient descent on your `Diagonal` input, you will get a non-diagonal result:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you try and use that dense matrix, to do gradient descent on your `Diagonal` input, you will get a non-diagonal result:
If you try and use that dense matrix to do gradient descent on your `Diagonal` input, you will get a non-diagonal result:

`[2.0 1.0; 1.0 2.0]`.
You have escape the subspace that the diagonal type represents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You have escape the subspace that the diagonal type represents.
You have escaped the subspace that the diagonal type represents.