Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Look at ways to reduce memory usage in operations that use hash tables #12139

Open
revans2 opened this issue Feb 14, 2025 · 1 comment
Open
Labels
? - Needs Triage Need team to review and classify feature request New feature or request performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 14, 2025

Is your feature request related to a problem? Please describe.
Hash joins and aggregations are some of the most common operations that we do. Joins especially, but there are a lot of aggregations that end up using hash tables.

Currently in CUDF all hash tables are allocated with a maximum occupancy of 50%.

https://github.com/rapidsai/cudf/blob/6c281fded5590e9896a901b5e12ce3a05d510be7/cpp/include/cudf/hashing/detail/helper_functions.cuh#L23

But when talking to the cuco team they mentioned that their hash tables, especially static_multimap has really good performance going up to a 90%

For a key multiplicity of 1 there is effectively no performance drop between 10% and 90% occupancy
For a key multiplicity of 8 there is about a 30% performance drop between 10% and 90% occupancy

In some cases we know a lot about the join cardinality and are already telling CUDF if the build side has distinct keys or not. We could potentially give more hints about the average key multiplicity.

For aggregations we can use previous batches to give us hints about future aggregation batches. We also know for merge aggregations how many upstream tasks there are or how many batches we are combining together to get a worst case key multiplicity.

The idea would be to see if we could reduce the memory usage without impacting the performance of average case queries. If we can reduce the memory pressure, then ideally we can reduce spilling/memory pressure.

So this is not really a performance improvement that we would see everywhere, but it is an experiment to see if we can improve really common operations that use lots of memory.

@revans2 revans2 added ? - Needs Triage Need team to review and classify feature request New feature or request performance A performance related task/issue labels Feb 14, 2025
@binmahone
Copy link
Collaborator

binmahone commented Feb 17, 2025

For a key multiplicity of 1 there is effectively no performance drop between 10% and 90% occupancy
For a key multiplicity of 8 there is about a 30% performance drop between 10% and 90% occupancy

May I ask where the numbers come from? it's a little bit unintuitive to me that high cardinality case (key multiplicity being 1) is more immune to capacity changes than low card cases. I'm curious to learn more about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants