-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-column support for SkipScan #3094
Comments
The above is what we are talking about, right? Any estimate regarding when this feature will be added? Any workarounds in the meantime? Note that I cannot make a column combining a and b because I want to filter on a with the index. |
One try at mimic of this feature... depends on every value of col2 existing for each value of col |
You can emulate this with lateral join like so:
|
@svenklemm Thanks for the tip. Can you put this in terms of col/col2/col3 above? I only see two columns referenced... EDIT... it is almost as if I need the EDIT: this has too many columns from a and doesn't quite seem to work out yet...
|
This emulates DISTINCT on (col, col2) and uses skipscan for both columns |
Thank you!! |
@svenklemm Any idea why it doesn't use skipscan with these alternate data types? In this case col2 is not as repetitive as with the data loaded in the first case, although there is some repetition. EDIT:
I notice the EDIT: I also noticed that col2 could not be type |
Hmm not sure but the datatypes are not the reason why it doesnt use skipscan. But in general you should always use text for any character columns. https://wiki.postgresql.org/wiki/Don't_Do_This#Text_storage
|
@svenklemm The query plan appears to be affected by the number of distinct values in col2. Is there something like pg_hint_plan that can help? Postgres really needs a way to force a query plan... sometimes the user knows best. |
Yes skip scan is affected by the number of distinct values. If the number of distinct values is close to the number of rows not using skipscan is more efficient. |
In my sample of real data the lateral strategy you cooked up above (thanks again!) is selecting about 4M of 14M records... that 14M being a small subsample of billions. It was fine (used SkipScan) after I ran ANALYZE, but at some point the stats were messed up and I can tell you that with only 3 or 4 "col2" values per "col" (on average) it was a lot slower to avoid SkipScan. I still think this should be under user control when needed... |
@svenklemm Just FYI, on 2.4.1 (Timescale Forge), the following instance of your method above seems to cause problems with EXPLAIN in the context of a distributed hypertable space-partitioned by the_id:
...the query itself runs fine. Should this be a separate bug report? This was in the context of timescaledb.enable_remote_explain=on ... this seems to be critical when reproducing the issue. |
I dont think this is related to SkipScan there is already a bugreport for this: #3128 |
Odd, the query above worked! Only the EXPLAIN failed, and even that worked again with enable_remote_explain=off |
Initial implementation of skipscan only supports single column distinct but the implementation could be enhanced to support multiple columns as well.
The text was updated successfully, but these errors were encountered: