Addressing ONNX.jl to Flux.jl graphs #120

dstarkenburg · 2025-02-18T18:39:32Z

Hello!

As you know I'm trying to help address the first major step for development which is adding the missing operators to this package. However, I know this package has a second goal that needs addressing.

As I work on this I want to know if there is anything I should be doing in parallel to help progress our ability to create Flux models from the Umlaut tapes that are read with this package. Can I get code in the conversions for operators we have or should we focus more on the operators in totality first?

Best,
Duncan

dfdx · 2025-02-18T22:57:01Z

Hi there! I don't think we need to implement all operators first. In fact, I believe ~20-30% of operators will be enough to onboard ~90% of modern ML models, so I'd be pragmatic here and do things that push your current goals the most. If you want to create Flux models from ONNX/Umlaut tapes, then it's great idea to invest into it.

However, don't expect it to be an easy task! Flux is a high-level framework that operates on high-level objects like layers. The mapping from Flux models to primitive graphs (ONNX, Umlaut tapes, etc.) is always unique, but the opposite mapping is not. Consider the following piece of graph, for example:

%5 = %2 * %3      # matrix-matrix multiplication
%6 = %5 .+ %4     # elementwise addition

This looks like a Dense() layer with weight matrix %2 and bias %4, but may also be a Dense layer and a separately added vector (e.g. residual layer) or even something totally different like part of dot-product attention.

I'd start with writing down a few ONNX/Umlaut graphs and corresponding Flux models, and inspecting them piece by piece. Is their a clear pattern of mapping? Are their frequent sequences of operators in graphs that we can detect? What if we already have an ML model and only need to map data?

Depending on these observations, we can decide whether we want to create a pattern matching mechanism that builds Flux models, or we want to generate code of Flux models apriori (e.g. using LLMs) and then map only data, or we even need to re-think Flux layer approach to reflect graph structure better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing ONNX.jl to Flux.jl graphs #120

Addressing ONNX.jl to Flux.jl graphs #120

dstarkenburg commented Feb 18, 2025 •

edited

Loading

dfdx commented Feb 18, 2025

Addressing ONNX.jl to Flux.jl graphs #120

Addressing ONNX.jl to Flux.jl graphs #120

Comments

dstarkenburg commented Feb 18, 2025 • edited Loading

dfdx commented Feb 18, 2025

dstarkenburg commented Feb 18, 2025 •

edited

Loading