Parallel Compression threads for high throughput data #3077

alexsanjoseph · 2021-03-31T10:00:40Z

Currently the compression policy seems to run be a single-threaded serial process. However, at high throughput, the ingest rates can be significantly more than what a single thread can handle, and hence cause the lots of uncompressed chunks, even though they're all in the queue.

Is there a way to run compression in parallel such that all the chunks which are uncompressed gets compressed in parallel assuming there is sufficient CPU, RAM etc?

Also, might be related, while trying to create multiple independent threads for compression by running multiple sessions manually, then seem to get blocked with a ShareUpdateExclusiveLock.

System Details:

Deployment: Single Node, Kubernetes Helm
Throughput - 500K records/s -> 5M metrics per second (but mostly sparse data)
Resources - 64 CPU, 128 GB RAM
Compression settings -
Chunk sizes - 15 minutes chunk size, comes to ~30-40 GB per chunk, compressed to ~500MB (98% efficiency - Kickass!)
Compression policy settings

Compression settings:

ALTER TABLE rawdata SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'device_uuid',
  timescaledb.compress_orderby = 'data_item_name, timestamp DESC'
);

SELECT add_compression_policy('rawdata', INTERVAL '50 hour');

PG Settings:

          max_connections: 500
          shared_buffers: 25GB
          effective_cache_size: 75GB
          maintenance_work_mem: 2GB
          checkpoint_completion_target: 0.9
          wal_buffers: 16MB
          default_statistics_target: 500
          random_page_cost: 1.1
          effective_io_concurrency: 1000
          work_mem: 873kB
          min_wal_size: 4GB
          max_wal_size: 128GB
          max_worker_processes: 60
          max_parallel_workers_per_gather: 30
          max_parallel_workers: 60
          max_parallel_maintenance_workers: 4

Timescale Settings:

timescaledbTune:
  args:
    max-bg-workers: 50

Deployment Settings:

image:
  # Image was built from
  # https://github.com/timescale/timescaledb-docker-ha
  tag: pg12-ts2.0-latest # https://hub.docker.com/r/timescale/timescaledb/tags?page=1&ordering=last_updated
  pullPolicy: IfNotPresent

persistentVolumes:
  data:
    size: 5000G
  wal:
    size: 500G

resources:
  limits:
    cpu: 64000m
    memory: 120960Mi
  requests:
    cpu: 55000m
    memory: 100480Mi

Currently the compresssion is lagging by 56 hours since it is not able to keep up with the data

The text was updated successfully, but these errors were encountered:

kvc0 · 2021-04-08T02:26:34Z

Came here to make this issue: This is also a consideration for existing tables with a lot of shards. Doing the compression with 1 cursor per hypertable leaves the server bored while there are months of static segments waiting to be compressed.

gpernelle · 2021-04-28T12:57:36Z

Would love to see it implemented as well. Compressing and decompressing faster would be really helpful, especially when it's so easy to spin up a large machine for that job and then downsizing again.

fasmit · 2022-01-16T08:02:11Z

We're running in the same problem. Initial tests with real world ingestion of one data domain, we set a relatively large chunk size of 2 hours, importing 150k rows/sec or ~50MB/sec for chunks of ~300GB. The compression of such a chunk takes 3.5 hours (~25MB/s). It's not possible to do so in parallel with two sessions either, as noted above.

This means that the compression will never be able to keep up with live imports, and thus necessarily will fall behind more and more, rendering it effectively useless for a real-world setting. We intend to ingest much more than this...

This is not just a "nice to have" feature request: without it, timescaledb compression is essentially unusable (for us). (... and if your data is much smaller, who needs compression to begin with?)

0xgeert · 2023-02-24T11:00:34Z

Why does this issues seem stale? As noted above, compression seems near useless for bigger tables as it stands now. This one feels like a priority to be honest.

Wenqihai · 2023-05-07T16:23:02Z

We're running in the same problem. after importing a large amount of history data, it takes a long time to compress a lot of chunks, because it's not possible to compress chunks in parallel, is there any good solutions now?

jvanns · 2023-07-03T10:13:49Z

Can't deny this would be an awfully useful feature to see implemented!

ysmilda · 2023-07-04T05:54:33Z

For us this would also be a great feature to have!

TheUbuntuGuy · 2023-10-05T18:19:05Z

We have hit a limit on single thread performance on our servers. We have to rate limit the data ingress (by multiple times) to ensure the compression keeps up. It is a huge waste of modern high core-count CPUs.

fasmit · 2023-10-06T07:18:51Z

Fwiw, since this issue is so old, we moved over to Citus where the compression works fine.

alexsanjoseph · 2023-10-07T05:50:58Z

Pretty sad that there has been no response after two years even with so many votes. Like ^, we have moved on to using Influx (which is not the best otherwise)

asiayeah · 2024-04-18T13:43:30Z

I can confirm the issue is still here (v2.13.1).

Some details are available in https://www.timescale.com/forum/t/timescale-only-uses-a-single-core-in-compression/2517

alexsanjoseph changed the title ~~Increased Compression speeds for high throughput data~~ Parallel Compression threads for high throughput data Mar 31, 2021

k-rus added compression enhancement labels Mar 31, 2021

NunoFilipeSantos added performance feature-request Feature proposal and removed community-request labels Sep 28, 2021

mkindahl mentioned this issue Oct 25, 2023

[Bug]: Compression policy job keeps on hanging using up lots of cpu #6223

Closed

Tindarid mentioned this issue Oct 27, 2023

[Enhancement]: compress chunks in the same hypertable in parallel #6239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Compression threads for high throughput data #3077

Parallel Compression threads for high throughput data #3077

alexsanjoseph commented Mar 31, 2021 •

edited

Loading

kvc0 commented Apr 8, 2021

gpernelle commented Apr 28, 2021

fasmit commented Jan 16, 2022 •

edited

Loading

0xgeert commented Feb 24, 2023 •

edited

Loading

Wenqihai commented May 7, 2023

jvanns commented Jul 3, 2023

ysmilda commented Jul 4, 2023

TheUbuntuGuy commented Oct 5, 2023

fasmit commented Oct 6, 2023

alexsanjoseph commented Oct 7, 2023

asiayeah commented Apr 18, 2024

Parallel Compression threads for high throughput data #3077

Parallel Compression threads for high throughput data #3077

Comments

alexsanjoseph commented Mar 31, 2021 • edited Loading

kvc0 commented Apr 8, 2021

gpernelle commented Apr 28, 2021

fasmit commented Jan 16, 2022 • edited Loading

0xgeert commented Feb 24, 2023 • edited Loading

Wenqihai commented May 7, 2023

jvanns commented Jul 3, 2023

ysmilda commented Jul 4, 2023

TheUbuntuGuy commented Oct 5, 2023

fasmit commented Oct 6, 2023

alexsanjoseph commented Oct 7, 2023

asiayeah commented Apr 18, 2024

alexsanjoseph commented Mar 31, 2021 •

edited

Loading

fasmit commented Jan 16, 2022 •

edited

Loading

0xgeert commented Feb 24, 2023 •

edited

Loading