Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Look into using the auto-tuner to give join/aggregation/filter specific metrics #12121

Open
revans2 opened this issue Feb 12, 2025 · 0 comments
Labels
feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 12, 2025

Is your feature request related to a problem? Please describe.
In doing query optimization we run into situations where we need to estimate the cardinality of a join (will it increase the row count by 10x or reduce it to 1/10th), filters, aggregations, etc. This comes into play quite often when we do things like memory planning on the GPU. We can use some heuristics to guess, we could also use AQE in some cases and possibly come up with estimates ourselves, but AQE only runs after the first shuffle, which might not be enough time to do some optimizations.

As such I would like to propose that we add in a set of configs that can be used to give fuzzy hints about specific operations in queries. Things like when we read table foo with predicate push down a > 5 and a <= 10, we ended up materializing 2 MiB of data per task. (possibly with min, median, and max values) not quite sure yet how we would be an ideal estimate.

The auto-tuner can then look at various parts of an application and encode configs that are outliers from what we would expect.

The plugin, when it is trying to make a decision, would then be able to read these configs (do a fuzzy match to see if it can find historical information that is relevant) and then use that information as input to the planning.

We might even be able to encode higher level data, like column d_index in the dates table appears to be a primary index for a dimension table. The auto-tuner could, in theory, detect this by looking at multiple join/aggregation operations and seeing how they behave. But this is a bit more advanced than matching what we saw before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants