-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: distinct cache panic on projection pushdown #25988
Conversation
Fixed a bug in the distinct cache where projection that skipped column in the cache hierarchy caused a panic. This simplifies the display of the projection in the DistinctCacheExec in EXPLAIN output to not include the column index, and only the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - just for my understanding, what caused the panic? The schema.fields().get()
call looks to have a ok_or
, so it's unclear to me where that error originated from.
@@ -102,8 +102,8 @@ impl TableProvider for DistinctCacheFunctionProvider { | |||
predicates, | |||
Arc::clone(&self.table_def), | |||
&[batches], | |||
self.schema(), | |||
projection, | |||
schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@praveen-influx this was the change that fixed the problem. self.schema()
is the unprojected schema of the cache. schema
is the projection of the same. Because the unprojected schema was getting passed in here, the internals of arrow-rs
were using indexes from the unprojected schema to access the projected one leading to the out of bounds error and panic - this is my understanding.
The solution was to pass the projected schema into the execution plan which is this line.
Downstream of that, this broke DisplayAs
logic that was summarizing projection info, so I simplified that to omit the column indexes and just display the names of columns projected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok - thanks, I didn't spot self.schema()
change to schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was pretty subtle 😄
FWIW - I actually had to go back and reproduce the issue again to explain the above comment.
Fixed a bug in the distinct cache where projection that skipped column in the cache hierarchy caused a panic.
This simplifies the display of the projection in the DistinctCacheExec in EXPLAIN output to not include the column index, and only the name.
No issue for this, see related slack thread.