-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel Compression threads for high throughput data #3077
Comments
Came here to make this issue: This is also a consideration for existing tables with a lot of shards. Doing the compression with 1 cursor per hypertable leaves the server bored while there are months of static segments waiting to be compressed. |
Would love to see it implemented as well. Compressing and decompressing faster would be really helpful, especially when it's so easy to spin up a large machine for that job and then downsizing again. |
We're running in the same problem. Initial tests with real world ingestion of one data domain, we set a relatively large chunk size of 2 hours, importing 150k rows/sec or ~50MB/sec for chunks of ~300GB. The compression of such a chunk takes 3.5 hours (~25MB/s). It's not possible to do so in parallel with two sessions either, as noted above. This means that the compression will never be able to keep up with live imports, and thus necessarily will fall behind more and more, rendering it effectively useless for a real-world setting. We intend to ingest much more than this... This is not just a "nice to have" feature request: without it, timescaledb compression is essentially unusable (for us). (... and if your data is much smaller, who needs compression to begin with?) |
Why does this issues seem stale? As noted above, compression seems near useless for bigger tables as it stands now. This one feels like a priority to be honest. |
We're running in the same problem. after importing a large amount of history data, it takes a long time to compress a lot of chunks, because it's not possible to compress chunks in parallel, is there any good solutions now? |
Can't deny this would be an awfully useful feature to see implemented! |
For us this would also be a great feature to have! |
We have hit a limit on single thread performance on our servers. We have to rate limit the data ingress (by multiple times) to ensure the compression keeps up. It is a huge waste of modern high core-count CPUs. |
Fwiw, since this issue is so old, we moved over to Citus where the compression works fine. |
Pretty sad that there has been no response after two years even with so many votes. Like ^, we have moved on to using Influx (which is not the best otherwise) |
I can confirm the issue is still here (v2.13.1). Some details are available in https://www.timescale.com/forum/t/timescale-only-uses-a-single-core-in-compression/2517 |
Currently the compression policy seems to run be a single-threaded serial process. However, at high throughput, the ingest rates can be significantly more than what a single thread can handle, and hence cause the lots of uncompressed chunks, even though they're all in the queue.
Is there a way to run compression in parallel such that all the chunks which are uncompressed gets compressed in parallel assuming there is sufficient CPU, RAM etc?
Also, might be related, while trying to create multiple independent threads for compression by running multiple sessions manually, then seem to get blocked with a
ShareUpdateExclusiveLock
.System Details:
Compression settings:
PG Settings:
Timescale Settings:
Deployment Settings:
Currently the compresssion is lagging by 56 hours since it is not able to keep up with the data
The text was updated successfully, but these errors were encountered: