Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: distinct cache panic on projection pushdown #25988

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

hiltontj
Copy link
Contributor

Fixed a bug in the distinct cache where projection that skipped column in the cache hierarchy caused a panic.

This simplifies the display of the projection in the DistinctCacheExec in EXPLAIN output to not include the column index, and only the name.

No issue for this, see related slack thread.

Fixed a bug in the distinct cache where projection that skipped column
in the cache hierarchy caused a panic.

This simplifies the display of the projection in the DistinctCacheExec
in EXPLAIN output to not include the column index, and only the name.
@hiltontj hiltontj added the v3 label Feb 11, 2025
@hiltontj hiltontj requested a review from a team February 11, 2025 01:27
@hiltontj hiltontj self-assigned this Feb 11, 2025
Copy link
Contributor

@praveen-influx praveen-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just for my understanding, what caused the panic? The schema.fields().get() call looks to have a ok_or, so it's unclear to me where that error originated from.

@@ -102,8 +102,8 @@ impl TableProvider for DistinctCacheFunctionProvider {
predicates,
Arc::clone(&self.table_def),
&[batches],
self.schema(),
projection,
schema,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@praveen-influx this was the change that fixed the problem. self.schema() is the unprojected schema of the cache. schema is the projection of the same. Because the unprojected schema was getting passed in here, the internals of arrow-rs were using indexes from the unprojected schema to access the projected one leading to the out of bounds error and panic - this is my understanding.

The solution was to pass the projected schema into the execution plan which is this line.

Downstream of that, this broke DisplayAs logic that was summarizing projection info, so I simplified that to omit the column indexes and just display the names of columns projected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok - thanks, I didn't spot self.schema() change to schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was pretty subtle 😄

FWIW - I actually had to go back and reproduce the issue again to explain the above comment.

@hiltontj hiltontj merged commit 04f10ad into main Feb 11, 2025
13 checks passed
@hiltontj hiltontj deleted the hiltontj/distinct-cache-proj-bug branch February 11, 2025 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants