Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Improve fallback mechanism if there are back to back transitions between the CPU and the GPU #12093

Open
kuhushukla opened this issue Feb 10, 2025 · 0 comments
Labels
performance A performance related task/issue

Comments

@kuhushukla
Copy link
Collaborator

Describe the bug
Consider a query which has 5 operators :

  • Scan (NOT_ON_GPU)
  • Filter (ON_GPU)
  • Project (NOT_ON_GPU)
  • Union (ON_GPU)

We will have 2 RowToColumnar and 1 ColumnarToRow transitions.
In cases where there is a wide schema or a significant amount of data, this can cause poor performance for that stage.
This example can be extended to other operators as well, however, we may consider joins etc. might be a bit more subtle on what the performance benefit of the operator will be versus the overhead of R2C.

Steps/Code to reproduce bug
Use above. More to follow.

Expected behavior
We should be smart about not moving things back and forth between the CPU and GPU at least in obvious cases that involve projects and filters. This will limit the cost the jobs incurs from these transitions.

Environment details (please complete the following information)

  • Reported for on-prem but applies to any platform

Additional context
Wide Schema with thousands of columns with strings

@kuhushukla kuhushukla added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 10, 2025
@mattahrens mattahrens added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants