Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing ONNX.jl to Flux.jl graphs #120

Open
dstarkenburg opened this issue Feb 18, 2025 · 1 comment
Open

Addressing ONNX.jl to Flux.jl graphs #120

dstarkenburg opened this issue Feb 18, 2025 · 1 comment

Comments

@dstarkenburg
Copy link
Contributor

dstarkenburg commented Feb 18, 2025

Hello!

As you know I'm trying to help address the first major step for development which is adding the missing operators to this package. However, I know this package has a second goal that needs addressing.

As I work on this I want to know if there is anything I should be doing in parallel to help progress our ability to create Flux models from the Umlaut tapes that are read with this package. Can I get code in the conversions for operators we have or should we focus more on the operators in totality first?

Best,
Duncan

@dfdx
Copy link
Collaborator

dfdx commented Feb 18, 2025

Hi there! I don't think we need to implement all operators first. In fact, I believe ~20-30% of operators will be enough to onboard ~90% of modern ML models, so I'd be pragmatic here and do things that push your current goals the most. If you want to create Flux models from ONNX/Umlaut tapes, then it's great idea to invest into it.

However, don't expect it to be an easy task! Flux is a high-level framework that operates on high-level objects like layers. The mapping from Flux models to primitive graphs (ONNX, Umlaut tapes, etc.) is always unique, but the opposite mapping is not. Consider the following piece of graph, for example:

%5 = %2 * %3      # matrix-matrix multiplication
%6 = %5 .+ %4     # elementwise addition

This looks like a Dense() layer with weight matrix %2 and bias %4, but may also be a Dense layer and a separately added vector (e.g. residual layer) or even something totally different like part of dot-product attention.

I'd start with writing down a few ONNX/Umlaut graphs and corresponding Flux models, and inspecting them piece by piece. Is their a clear pattern of mapping? Are their frequent sequences of operators in graphs that we can detect? What if we already have an ML model and only need to map data?

Depending on these observations, we can decide whether we want to create a pattern matching mechanism that builds Flux models, or we want to generate code of Flux models apriori (e.g. using LLMs) and then map only data, or we even need to re-think Flux layer approach to reflect graph structure better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants