-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement]: Chunk Skipping on UUID values #7744
Comments
I +1 this enhancement request. In fact I'm redesigning some hypertables to use bigints instead of uuids just to workaround this limitation. But I definitely understand the counterargument. @tpetry don't forget uuid isn't exactly recommended for compression. I'm not 100% sure uuid7 fully resolves the locality issue (maybe?) |
UUID7 values are time-sorted with the first 48 bits used for the time in millisecond precision. So with a 14-days hypertable partitioning only the bits 18 to 48 for the timestamp and all later bits can change. Thats already a 18/128=0.14 compression factor. Not bad for mostly random data. But in reality similar timestamps will be very local. So those 48 bits will mostly have tiny changes for very frequent inserts. With run-length encoding those bits can be reduced heavily. The upper compression limit is 48/128=0.375. Which I believe is awesome when thinking that uuid7 values are mostly random. So just use run-length encoding (as used with integers). It can lead to relatively good compression for uuid7 and just nothing for uuid4 which is absolutely not compressible at all. |
I'm all in. However, my understanding is that pg18 will introduce a new function for uuid7. How are you getting them in pg17 and below? |
You can generate them within the application. Or use one of the many SQL implementations if you can‘t install an extension: https://gist.github.com/fabiolimace/515a0440e3e40efeb234e12644a6a346 However, providing a uuid7 generate function and timestamp_from_uuid7 within the timescale namespace would be absolutely awesome. |
What type of enhancement is this?
Performance
What subsystems and features will be improved?
Query planner
What does the enhancement do?
At the moment, we've got chunk skipping on integer values to use with serial ids. It's working great, but as soon as you insert new rows into old chunks those chunk skipping ranges are bloated and match every possible id. This leads to some chunks always been selected by the query planner and significantly impacted performance:
Obviously, the
chunk_1
will now match any integer because it's range has been broaded so far because of a single very late-ingested row. The more you backfill old data (e.g. by importing historic records for new customers), the problem will be bigger. At some point, the chunk skipping is useless as every chunk's range overlaps with some older ids.An alternative approach would be allowing chunk skipping on uuid values - when used with uuid7 values its a great feature. We could link the uuid7 value to the time column used for partioning (related to #7682). So the chunk skipping ranges for each chunk can NEVER overlap and the problem with the former serial column doesn't happen. We'll get perfect chunk skipping for uuid7 values :)
Implementation challenges
Implementation should be really easy: Chunk skipping currently works on integers and timestamps - both are fixed-size values. A uuid7 value is also a fixed-size 128 bit value and can be represented as an integer. So chunk skipping would exactly work like with 64-bit
bigint
values but now with 128-bituuid
values. The needed changed should be minimal.A possible counter argument would be that
uuid
is mostly used with random uuidv4 values and chunk skipping wouldn't make any sense. That's true. But you can also use chunk skipping onint
values filled withRANDOM()
or any other semantic value which doesn't match the chunk skipping idea.The text was updated successfully, but these errors were encountered: