You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In doing query optimization we run into situations where we need to estimate the cardinality of a join (will it increase the row count by 10x or reduce it to 1/10th), filters, aggregations, etc. This comes into play quite often when we do things like memory planning on the GPU. We can use some heuristics to guess, we could also use AQE in some cases and possibly come up with estimates ourselves, but AQE only runs after the first shuffle, which might not be enough time to do some optimizations.
As such I would like to propose that we add in a set of configs that can be used to give fuzzy hints about specific operations in queries. Things like when we read table foo with predicate push down a > 5 and a <= 10, we ended up materializing 2 MiB of data per task. (possibly with min, median, and max values) not quite sure yet how we would be an ideal estimate.
The auto-tuner can then look at various parts of an application and encode configs that are outliers from what we would expect.
The plugin, when it is trying to make a decision, would then be able to read these configs (do a fuzzy match to see if it can find historical information that is relevant) and then use that information as input to the planning.
We might even be able to encode higher level data, like column d_index in the dates table appears to be a primary index for a dimension table. The auto-tuner could, in theory, detect this by looking at multiple join/aggregation operations and seeing how they behave. But this is a bit more advanced than matching what we saw before.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In doing query optimization we run into situations where we need to estimate the cardinality of a join (will it increase the row count by 10x or reduce it to 1/10th), filters, aggregations, etc. This comes into play quite often when we do things like memory planning on the GPU. We can use some heuristics to guess, we could also use AQE in some cases and possibly come up with estimates ourselves, but AQE only runs after the first shuffle, which might not be enough time to do some optimizations.
As such I would like to propose that we add in a set of configs that can be used to give fuzzy hints about specific operations in queries. Things like when we read table foo with predicate push down a > 5 and a <= 10, we ended up materializing 2 MiB of data per task. (possibly with min, median, and max values) not quite sure yet how we would be an ideal estimate.
The auto-tuner can then look at various parts of an application and encode configs that are outliers from what we would expect.
The plugin, when it is trying to make a decision, would then be able to read these configs (do a fuzzy match to see if it can find historical information that is relevant) and then use that information as input to the planning.
We might even be able to encode higher level data, like column d_index in the dates table appears to be a primary index for a dimension table. The auto-tuner could, in theory, detect this by looking at multiple join/aggregation operations and seeing how they behave. But this is a bit more advanced than matching what we saw before.
The text was updated successfully, but these errors were encountered: