Trouble implementing models in Axon #522

krstopro · 2023-08-19T21:29:02Z

Recently I tried coding a somewhat popular model in Axon, only to give up after few hours. The reason is that I found implementing custom models very hard, if not impossible. It could be that I am missing something, that I need more practice, or that I am biased towards PyTorch that I was using for years. Still, I would like to ask some questions.
Is there any particular reason why Axon.layer_name was chosen to return a function and not a struct with parameters? I know the latter is more OOP than functional (as stated here), but implementing custom models to me seems simpler this way and it allows for parameters to be reused easily. Also, I think this is exactly the approach that was taken in Nx.Scholar.
Many popular algorithms involve applying the same set of weights over and over again to the input. For example, recurrent neural networks (potentially with custom cells), Deep Sets, meta-learning (MAML and related algorithms), Neural Cellular Automata, etc. Currently, I don't think it is even possible to implement these in Axon (not sure what is going on with get_parameters and set_parameters functions).
Am I missing something? Or are there any plans to change the approach? I am aware of Add weight sharing issue still being open.

The text was updated successfully, but these errors were encountered:

seanmor5 · 2023-08-19T21:36:40Z

Can you provide which model you tried implementing?

The reason layer names are functions are to make inference deterministic. Previously we relied on unique_integer to ensure layers/parameters had unique names and would not accidentally be shared across layers. An issue with this is that if you have a function which returns an Axon model, you would always get a unique model if you didn't explicitly specify every layer name. That means every time you tried to use one of these models it would need to be fully recompiled by Nx/EXLA.

Note you can still pass a binary as a layer name. The function is really only used internally. The only other way to enforce this is to just force name as a required parameter of each layer

Looking at some of the model implementations in Bumblebee might help a bit as well. It's difficult to know though without understanding what you're trying to do

krstopro · 2023-08-19T21:52:11Z

Can you provide which model you tried implementing?

Sure! I was trying to implement Neural Cellular Automata (see https://distill.pub/2020/growing-ca/) for fun and practice.
The main issue was applying the same module (sequence of layers) to the input for a fixed number of steps (could be passed as an option to the model). I think the figure on the link somewhat explains what is going on. There are existing PyTorch and Tensorflow implementations, so I don't need to reinvent the wheel.

Looking at some of the model implementations in Bumblebee might help a bit as well.

This might be a good next step for me. Thanks!

seanmor5 · 2023-08-19T23:19:05Z

Is there a particular reason why, for example, we have Axon.dense as a function, instead of Axon.Dense module with Axon.Dense.new (which returns the parameters) and Axon.Dense.apply or Axon.Dense.forward which takes those parameters together with the input?

I think the approach we take at the moment is much more versatile and functional than the module-based approach. I'm not quite sure how having separate modules per layer would work. In scholar those algorithms are standalone, but in Axon layers are composable and the module-based approach is not very composable. Note that you can get "apply" but just going to the low level Axon.Layers implementations. You will need to manage parameter initialization yourself, but it is possible.

krstopro · 2023-08-20T09:38:59Z

In scholar those algorithms are standalone, but in Axon layers are composable and the module-based approach is not very composable.

In module based approach compositionality should be easily achievable with Sequential API (e.g. as done in PyTorch). Parameter sharing in this approach is very simple, since all you need to do is pass parameters as arguments. In current approach, I don't think parameter sharing is even possible.

Note that you can get "apply" but just going to the low level Axon.Layers implementations. You will need to manage parameter initialization yourself, but it is possible.

Indeed, I am aware of transforms implemented in Axon.Layers, but I don't think this solves the parameter sharing.

josevalim · 2023-08-20T09:51:04Z

I think framing it as module vs function is barking at the wrong tree. :) A module is nothing more than a collection of functions and it won't add any inherent capabilities. Structs, as mentioned earlier, could introduce new capabilities, and they would tie data to a module, but due to Elixir's functional nature and given that structs in Elixir is nothing more than maps with a special key, it could be replicated easily in other ways.

I think it is worth taking a step back (well, at least for me which is not well versed in ML). We are talking a lot about the solution, but I am trying to grasp the problem. If the issue is parameter sharing, wouldn't it be a matter of replacing this line:

https://github.com/elixir-nx/axon/blob/main/lib/axon.ex#L722

By something like:

kernel = opts[:kernel_param] || param("kernel", kernel_shape, initializer: opts[:kernel_initializer])

?

josevalim · 2023-08-20T09:53:27Z

Ah, I guess the issue with the above is that the parameters are not considered shareable anyway. But we could likely introduce axon = Axon.add_shared_param(axon, ....) with are all stored in a global configuration in Axon.

krstopro · 2023-08-20T09:55:13Z

Ah, I guess the issue with the above is that the parameters are not considered shareable anyway. But we could likely introduce axon = Axon.add_shared_param(axon, ....) with are all stored in a global configuration in Axon.

But how do you pass parameter to the layer? I don't think this is possible at the moment.

josevalim · 2023-08-20T09:59:00Z

@krstopro change the layer to either:

allow the parameter to be given as an option
allow a layer param to be converted to shared after the fact

Anyway, the point I am getting to is that I don't think this is a design issue, and I think treating it as a design issue is going to lead to the wrong direction. Remember that objects (or rather mutability) allows you to give everything an identity based on the position they are in memory (and the complexity that comes from it because now you need to track how things change over time!). So you share by making things point to the same place. There is nothing in modules or structs in Elixir that will give you that. If you want to share something, you need to give it an explicit name, and put it somewhere shared.

krstopro · 2023-08-20T10:07:44Z

@josevalim

allow the parameter to be given as an option

This might solve the problem.

allow a layer param to be converted to shared after the fact

Hmmm this might require solution number 1.

What I was saying is the following approach (taken exactly in Nx.Scholar; something similar is done in PyTorch).

defmodule Dense do
  defstruct [:weights, :bias]
  
  def new(num_units) do
    # returns struct with weights and bias parameters
  end
  
  def apply(x, %__MODULE__{weights: weights, bias: bias}) do # or def forward
    # applies weights and bias to x
  end
end

Then, suppose we have two inputs x1 and x2. It is very easy to apply the same layer to these inputs by doing

layer = Axon.Dense.new(num_units)
y1 = Axon.Dense.apply(x1, layer)
y2 = Axon.Dense.apply(x2, layer)

I don't think we can do something like this at the moment.

josevalim · 2023-08-20T10:11:21Z

That would not solve it necessarily. Look at this Python code:

>>> class User:
...   name = None
...
>>> user = User()
>>> user.name = "josé"
>>> other = User()
>>> other.name = "josé"
>>> user == other
False

and then:

>>> from dataclasses import dataclass
>>> @dataclass
... class User:
...   name = None
...
>>> user = User()
>>> user.name = "josé"
>>> other = User()
>>> other.name = "josé"
>>> user == other
True

In Elixir, everything is data (the second). So if you have two parameters with the same name and shape, does it mean it is shared? No, it doesn't, it could be a coincidence that they are represented the same. They may still get different values on execution. That's why having a struct would not help, because we still could not simply assume that because something looks the same, it is the same. With objects, they would point to different memory addresses, and that's how you would know they are different.

One way to do this in Elixir is by adding a unique value, such as make_ref(), to each parameter, this way we would know if they are the same by looking at the make_ref(). So a parameter in different layers with the same ref would be the same: great!

The problem is that it breaks equality:

iex> Axon.dense(32) == Axon.dense(32)
false

This would be false because now it points to something unique (like objects/memory address would), but this feels very counter intuitive when everything we have is data.

josevalim · 2023-08-20T10:18:46Z

Ok, here is a more concrete API:

Add Axon.shared_param(...), the same as Axon.param but returns Axon.SharedParam
Add :weight and :bias option to dense (you can deprecate/replace :use_bias by setting :bias to false)
Once a Axon.shared_param is given to a layer, Axon will also store it in a field called shared_params = %{shared_param_key => shared_param_struct}. All shared_params with the same name must be the exact same
Shared params must be given under a different namespace when executing (perhaps "_shared" - this may even mean we could implement all of this by having a fake layer where all shared params are stored - but that's an implementation detail)

krstopro · 2023-08-20T10:19:13Z

@josevalim I see now, thanks! I think this is what @seanmor5 already wrote here.

josevalim · 2023-08-20T10:20:11Z

Yeah, exactly. We made these mistakes in the past, which is one way of learning. :D

krstopro · 2023-08-20T10:22:21Z

Ok, here is a more concrete API:

Add Axon.shared_param(...), the same as Axon.param but returns Axon.SharedParam

Add :weight and :bias option to dense (you can deprecate/replace :use_bias by setting :bias to false)

Once a Axon.shared_param is given to a layer, Axon will also store it in a field called shared_params = %{shared_param_key => shared_param_struct}. All shared_params with the same name must be the exact same

Shared params must be given under a different namespace when executing (perhaps "_shared" - this may even mean we could implement all of this by having a fake layer where all shared params are stored - but that's an implementation detail)

@josevalim Correct me if I'm wrong, but implementing these would require changing every existing layer in Axon, right?

josevalim · 2023-08-20T10:33:43Z

Good call. Layers would need to declare which parameters they allow to share, correct. Making a param shared, after it is defined, would not. Something like this:

axon
|> Axon.dense("dense1", 32)
|> Axon.share_param("dense1.bias", as: "shared_dense_bias")
|> Axon.dense("dense2", 32)
|> Axon.share_param("dense2.bias", as: "shared_dense_bias")

We still need to store somewhere shared to make sure the shapes match, but the API above would be less bureaucratic, yeah. Or even:

axon
|> Axon.dense("dense1", 32)
|> Axon.dense("dense2", 32)
|> Axon.share_param(["dense1.bias", "dense2.bias"], as: "shared_dense_bias")

krstopro · 2023-08-20T10:35:43Z

One of the problems might be the following (again, I might be wrong).
Suppose we have a complex model, e.g. something like (taken directly from https://hexdocs.pm/axon/Axon.html)

model =
  input
  |> Axon.dense(128, activation: :relu)
  |> Axon.batch_norm()
  |> Axon.dropout(rate: 0.8)
  |> Axon.dense(64)
  |> Axon.tanh()
  |> Axon.dense(10)
  |> Axon.activation(:softmax)

How do I share its parameters?

josevalim · 2023-08-20T10:40:24Z

IIRC, each layer has a name, even if you don't give one explicitly (e.g. dense1, dense2, dense3, etc). So you can either rely on those names, but I would instead explicit name the layer so we can share the params. So something like:

model =
  input
  |> Axon.dense(128, activation: :relu, name: "dense1")
  |> Axon.batch_norm()
  |> Axon.dropout(rate: 0.8)
  |> Axon.dense(64, name: "dense2")
  |> Axon.tanh()
  |> Axon.dense(10, name: "dense3")
  |> Axon.activation(:softmax)
  |> Axon.share_param(["dense1.bias", "dense2.bias", "dense3.bias"], as: "shared_bias")

krstopro · 2023-08-20T20:00:07Z

I've been thinking about this, but I am not sure if it solves the problem. If we have two inputs, say input1 = Axon.input("input1", shape: {nil, dim}) and input2 = Axon.input("input2", shape: {nil, dim}) how do we apply the same sequence of layers (with the same weights!) to both input1 and input2?

Another issue with accessing layers by name might be if we wanna do the same with a complex model with a lot of layers (e.g. ResNet101 that is 101 layers deep). Would we need to name every layer in the model and then iterate over them?

polvalente · 2023-08-20T20:33:19Z

I believe it would be something like this:

input1 = Axon.input("x", shape: {nil, 10})
input2 = Axon.input("y", shape: {nil, 10})

model_fn = fn input, i ->
  input
  |> Axon.dense(16, name: "dense0_#{i}", activation: :relu)
  |> Axon.dense(32, name: "dense1_#{i}", activation: :relu)
end

model = 
  Axon.concatenate([model_fn.(input1, "x"), model_fn.(input2, "y")])
  |> Axon.dense(20, activation: :relu)
  |> Axon.dense(1, activation: :softmax)
  |> Axon.share_param(["dense0_x.bias", "dense0_y.bias"], as: "shared_bias0")
  |> Axon.share_param(["dense1_x.bias", "dense1_y.bias"], as: "shared_bias1")

Although I believe we could get away with marking a given subgraph as shared, as if it was possible to do:

model_fn = Axon.shared_params("shared_params0", fn input -> 
  input
  |> Axon.dense(16, activation: :relu)
  |> Axon.dense(32, activation: :relu)
end

And then this function would be usable in the same fashion I used above, but instead of each call creating fully separate nodes, the second call would know to use the same parameters as the first one, close to an Axon.namespace

josevalim · 2023-08-20T20:34:22Z

@polvalente your version would conflict on the name, no?

polvalente · 2023-08-20T20:35:46Z

@josevalim As it currently stands, the name is ignored and the graph generated contains 2 separate instances, at least as shown via Axon.Display:

polvalente · 2023-08-20T20:36:57Z

Although re-reading my example is kind of nonsense :)
I'll edit with the correct one.

seanmor5 · 2023-08-20T20:53:50Z

@polvalente Your solution is kind of what I was thinking with Axon.block which would represent a re-usable block where the parameters are always the same. Though maybe it makes sense to have it something that is more explicit like Axon.clone e.g. use this function to create a clone of the subgraph contained everytime it is used. It would return an anonymous function and then anytime it is present it is the same parameters used in different places

krstopro · 2023-08-20T21:58:28Z

@polvalente Your solution is kind of what I was thinking with Axon.block which would represent a re-usable block where the parameters are always the same. Though maybe it makes sense to have it something that is more explicit like Axon.clone e.g. use this function to create a clone of the subgraph contained everytime it is used. It would return an anonymous function and then anytime it is present it is the same parameters used in different places

Would it make sense to have higher-order functions, such as Axon.map that applies the layer (with same parameters) to an Enum of inputs?

polvalente · 2023-08-20T21:59:41Z

If you have that Axon.block/clone returning an anonymous arity-1 function, you can just Enum.map that over your Enum of inputs

krstopro · 2023-08-20T22:00:37Z

If you have that Axon.block returning an anonymous arity-1 function, you can just Enum.map that over your Enum of inputs

I guess I have to check how Axon.block/clone works. :)

polvalente · 2023-08-20T22:02:26Z

If you have that Axon.block returning an anonymous arity-1 function, you can just Enum.map that over your Enum of inputs

I guess I have to check how Axon.block/clone works. :)

That's the suggestion Sean made right above, in the comment you replied to.

seanmor5 · 2023-08-20T23:28:29Z

I'll take a crack at Axon.block this week and post the branch for some feedback

krstopro · 2023-08-20T23:35:42Z

I'll take a crack at Axon.block this week and post the branch for some feedback

That's awesome, thanks for the quick action! Will be happy to contribute, assuming I can.

seanmor5 · 2023-08-21T17:11:16Z

@krstopro There is a draft of blocks here: #524

Please give it a try and let me know if it helps a bit :)

The idea is that you can use blocks almost like you would a PyTorch module. It's an incomplete draft, so expect bugs and limitations. I will continue working on it this week

krstopro changed the title ~~Trouble with implementing models in Axon~~ Trouble implementing models in Axon Aug 19, 2023

seanmor5 mentioned this issue Aug 21, 2023

Add block utility #524

Merged

seanmor5 closed this as completed in #524 Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble implementing models in Axon #522

Trouble implementing models in Axon #522

krstopro commented Aug 19, 2023

seanmor5 commented Aug 19, 2023

krstopro commented Aug 19, 2023 •

edited

Loading

seanmor5 commented Aug 19, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

polvalente commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023

polvalente commented Aug 20, 2023

polvalente commented Aug 20, 2023

seanmor5 commented Aug 20, 2023

krstopro commented Aug 20, 2023 •

edited

Loading

polvalente commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023

polvalente commented Aug 20, 2023

seanmor5 commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023

seanmor5 commented Aug 21, 2023

Trouble implementing models in Axon #522

Trouble implementing models in Axon #522

Comments

krstopro commented Aug 19, 2023

seanmor5 commented Aug 19, 2023

krstopro commented Aug 19, 2023 • edited Loading

seanmor5 commented Aug 19, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023 • edited Loading

josevalim commented Aug 20, 2023 • edited Loading

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023 • edited Loading

josevalim commented Aug 20, 2023 • edited Loading

krstopro commented Aug 20, 2023 • edited Loading

josevalim commented Aug 20, 2023

krstopro commented Aug 20, 2023

polvalente commented Aug 20, 2023 • edited Loading

josevalim commented Aug 20, 2023

polvalente commented Aug 20, 2023

polvalente commented Aug 20, 2023

seanmor5 commented Aug 20, 2023

krstopro commented Aug 20, 2023 • edited Loading

polvalente commented Aug 20, 2023 • edited Loading

krstopro commented Aug 20, 2023

polvalente commented Aug 20, 2023

seanmor5 commented Aug 20, 2023 • edited Loading

krstopro commented Aug 20, 2023

seanmor5 commented Aug 21, 2023

krstopro commented Aug 19, 2023 •

edited

Loading

krstopro commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023 •

edited

Loading

josevalim commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023 •

edited

Loading

polvalente commented Aug 20, 2023 •

edited

Loading

krstopro commented Aug 20, 2023 •

edited

Loading

polvalente commented Aug 20, 2023 •

edited

Loading

seanmor5 commented Aug 20, 2023 •

edited

Loading