-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble implementing models in Axon #522
Comments
Can you provide which model you tried implementing? The reason layer names are functions are to make inference deterministic. Previously we relied on unique_integer to ensure layers/parameters had unique names and would not accidentally be shared across layers. An issue with this is that if you have a function which returns an Axon model, you would always get a unique model if you didn't explicitly specify every layer name. That means every time you tried to use one of these models it would need to be fully recompiled by Nx/EXLA. Note you can still pass a binary as a layer name. The function is really only used internally. The only other way to enforce this is to just force name as a required parameter of each layer Looking at some of the model implementations in Bumblebee might help a bit as well. It's difficult to know though without understanding what you're trying to do |
Sure! I was trying to implement Neural Cellular Automata (see https://distill.pub/2020/growing-ca/) for fun and practice.
This might be a good next step for me. Thanks! |
I think the approach we take at the moment is much more versatile and functional than the module-based approach. I'm not quite sure how having separate modules per layer would work. In scholar those algorithms are standalone, but in Axon layers are composable and the module-based approach is not very composable. Note that you can get "apply" but just going to the low level Axon.Layers implementations. You will need to manage parameter initialization yourself, but it is possible. |
In module based approach compositionality should be easily achievable with Sequential API (e.g. as done in PyTorch). Parameter sharing in this approach is very simple, since all you need to do is pass parameters as arguments. In current approach, I don't think parameter sharing is even possible.
Indeed, I am aware of transforms implemented in |
I think framing it as module vs function is barking at the wrong tree. :) A module is nothing more than a collection of functions and it won't add any inherent capabilities. Structs, as mentioned earlier, could introduce new capabilities, and they would tie data to a module, but due to Elixir's functional nature and given that structs in Elixir is nothing more than maps with a special key, it could be replicated easily in other ways. I think it is worth taking a step back (well, at least for me which is not well versed in ML). We are talking a lot about the solution, but I am trying to grasp the problem. If the issue is parameter sharing, wouldn't it be a matter of replacing this line: https://github.com/elixir-nx/axon/blob/main/lib/axon.ex#L722 By something like:
? |
Ah, I guess the issue with the above is that the parameters are not considered shareable anyway. But we could likely introduce |
But how do you pass parameter to the layer? I don't think this is possible at the moment. |
@krstopro change the layer to either:
Anyway, the point I am getting to is that I don't think this is a design issue, and I think treating it as a design issue is going to lead to the wrong direction. Remember that objects (or rather mutability) allows you to give everything an identity based on the position they are in memory (and the complexity that comes from it because now you need to track how things change over time!). So you share by making things point to the same place. There is nothing in modules or structs in Elixir that will give you that. If you want to share something, you need to give it an explicit name, and put it somewhere shared. |
This might solve the problem.
Hmmm this might require solution number 1. What I was saying is the following approach (taken exactly in Nx.Scholar; something similar is done in PyTorch).
Then, suppose we have two inputs
I don't think we can do something like this at the moment. |
That would not solve it necessarily. Look at this Python code: >>> class User:
... name = None
...
>>> user = User()
>>> user.name = "josé"
>>> other = User()
>>> other.name = "josé"
>>> user == other
False and then: >>> from dataclasses import dataclass
>>> @dataclass
... class User:
... name = None
...
>>> user = User()
>>> user.name = "josé"
>>> other = User()
>>> other.name = "josé"
>>> user == other
True In Elixir, everything is data (the second). So if you have two parameters with the same name and shape, does it mean it is shared? No, it doesn't, it could be a coincidence that they are represented the same. They may still get different values on execution. That's why having a struct would not help, because we still could not simply assume that because something looks the same, it is the same. With objects, they would point to different memory addresses, and that's how you would know they are different. One way to do this in Elixir is by adding a unique value, such as The problem is that it breaks equality: iex> Axon.dense(32) == Axon.dense(32)
false This would be false because now it points to something unique (like objects/memory address would), but this feels very counter intuitive when everything we have is data. |
Ok, here is a more concrete API:
|
@josevalim I see now, thanks! I think this is what @seanmor5 already wrote here. |
Yeah, exactly. We made these mistakes in the past, which is one way of learning. :D |
@josevalim Correct me if I'm wrong, but implementing these would require changing every existing layer in Axon, right? |
Good call. Layers would need to declare which parameters they allow to share, correct. Making a param shared, after it is defined, would not. Something like this: axon
|> Axon.dense("dense1", 32)
|> Axon.share_param("dense1.bias", as: "shared_dense_bias")
|> Axon.dense("dense2", 32)
|> Axon.share_param("dense2.bias", as: "shared_dense_bias") We still need to store somewhere shared to make sure the shapes match, but the API above would be less bureaucratic, yeah. Or even: axon
|> Axon.dense("dense1", 32)
|> Axon.dense("dense2", 32)
|> Axon.share_param(["dense1.bias", "dense2.bias"], as: "shared_dense_bias") |
One of the problems might be the following (again, I might be wrong).
How do I share its parameters? |
IIRC, each layer has a name, even if you don't give one explicitly (e.g. dense1, dense2, dense3, etc). So you can either rely on those names, but I would instead explicit name the layer so we can share the params. So something like:
|
I've been thinking about this, but I am not sure if it solves the problem. If we have two inputs, say Another issue with accessing layers by name might be if we wanna do the same with a complex model with a lot of layers (e.g. ResNet101 that is 101 layers deep). Would we need to name every layer in the model and then iterate over them? |
I believe it would be something like this: input1 = Axon.input("x", shape: {nil, 10})
input2 = Axon.input("y", shape: {nil, 10})
model_fn = fn input, i ->
input
|> Axon.dense(16, name: "dense0_#{i}", activation: :relu)
|> Axon.dense(32, name: "dense1_#{i}", activation: :relu)
end
model =
Axon.concatenate([model_fn.(input1, "x"), model_fn.(input2, "y")])
|> Axon.dense(20, activation: :relu)
|> Axon.dense(1, activation: :softmax)
|> Axon.share_param(["dense0_x.bias", "dense0_y.bias"], as: "shared_bias0")
|> Axon.share_param(["dense1_x.bias", "dense1_y.bias"], as: "shared_bias1") Although I believe we could get away with marking a given subgraph as shared, as if it was possible to do: model_fn = Axon.shared_params("shared_params0", fn input ->
input
|> Axon.dense(16, activation: :relu)
|> Axon.dense(32, activation: :relu)
end
And then this function would be usable in the same fashion I used above, but instead of each call creating fully separate nodes, the second call would know to use the same parameters as the first one, close to an |
@polvalente your version would conflict on the name, no? |
@josevalim As it currently stands, the name is ignored and the graph generated contains 2 separate instances, at least as shown via Axon.Display: |
Although re-reading my example is kind of nonsense :) |
@polvalente Your solution is kind of what I was thinking with |
Would it make sense to have higher-order functions, such as Axon.map that applies the layer (with same parameters) to an Enum of inputs? |
If you have that Axon.block/clone returning an anonymous arity-1 function, you can just Enum.map that over your Enum of inputs |
I guess I have to check how Axon.block/clone works. :) |
That's the suggestion Sean made right above, in the comment you replied to. |
I'll take a crack at |
That's awesome, thanks for the quick action! Will be happy to contribute, assuming I can. |
Recently I tried coding a somewhat popular model in Axon, only to give up after few hours. The reason is that I found implementing custom models very hard, if not impossible. It could be that I am missing something, that I need more practice, or that I am biased towards PyTorch that I was using for years. Still, I would like to ask some questions.
Is there any particular reason why
Axon.layer_name
was chosen to return a function and not a struct with parameters? I know the latter is more OOP than functional (as stated here), but implementing custom models to me seems simpler this way and it allows for parameters to be reused easily. Also, I think this is exactly the approach that was taken inNx.Scholar
.Many popular algorithms involve applying the same set of weights over and over again to the input. For example, recurrent neural networks (potentially with custom cells), Deep Sets, meta-learning (MAML and related algorithms), Neural Cellular Automata, etc. Currently, I don't think it is even possible to implement these in
Axon
(not sure what is going on withget_parameters
andset_parameters
functions).Am I missing something? Or are there any plans to change the approach? I am aware of Add weight sharing issue still being open.
The text was updated successfully, but these errors were encountered: