Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Compression threads for high throughput data #3077

Open
alexsanjoseph opened this issue Mar 31, 2021 · 11 comments
Open

Parallel Compression threads for high throughput data #3077

alexsanjoseph opened this issue Mar 31, 2021 · 11 comments

Comments

@alexsanjoseph
Copy link

alexsanjoseph commented Mar 31, 2021

Currently the compression policy seems to run be a single-threaded serial process. However, at high throughput, the ingest rates can be significantly more than what a single thread can handle, and hence cause the lots of uncompressed chunks, even though they're all in the queue.

Is there a way to run compression in parallel such that all the chunks which are uncompressed gets compressed in parallel assuming there is sufficient CPU, RAM etc?

Also, might be related, while trying to create multiple independent threads for compression by running multiple sessions manually, then seem to get blocked with a ShareUpdateExclusiveLock.

System Details:

  • Deployment: Single Node, Kubernetes Helm
  • Throughput - 500K records/s -> 5M metrics per second (but mostly sparse data)
  • Resources - 64 CPU, 128 GB RAM
  • Compression settings -
  • Chunk sizes - 15 minutes chunk size, comes to ~30-40 GB per chunk, compressed to ~500MB (98% efficiency - Kickass!)
  • Compression policy settings

Compression settings:

ALTER TABLE rawdata SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'device_uuid',
  timescaledb.compress_orderby = 'data_item_name, timestamp DESC'
);

SELECT add_compression_policy('rawdata', INTERVAL '50 hour');

PG Settings:

          max_connections: 500
          shared_buffers: 25GB
          effective_cache_size: 75GB
          maintenance_work_mem: 2GB
          checkpoint_completion_target: 0.9
          wal_buffers: 16MB
          default_statistics_target: 500
          random_page_cost: 1.1
          effective_io_concurrency: 1000
          work_mem: 873kB
          min_wal_size: 4GB
          max_wal_size: 128GB
          max_worker_processes: 60
          max_parallel_workers_per_gather: 30
          max_parallel_workers: 60
          max_parallel_maintenance_workers: 4

Timescale Settings:

timescaledbTune:
  args:
    max-bg-workers: 50

Deployment Settings:

image:
  # Image was built from
  # https://github.com/timescale/timescaledb-docker-ha
  tag: pg12-ts2.0-latest # https://hub.docker.com/r/timescale/timescaledb/tags?page=1&ordering=last_updated
  pullPolicy: IfNotPresent

persistentVolumes:
  data:
    size: 5000G
  wal:
    size: 500G

resources:
  limits:
    cpu: 64000m
    memory: 120960Mi
  requests:
    cpu: 55000m
    memory: 100480Mi

Currently the compresssion is lagging by 56 hours since it is not able to keep up with the data

@alexsanjoseph alexsanjoseph changed the title Increased Compression speeds for high throughput data Parallel Compression threads for high throughput data Mar 31, 2021
@kvc0
Copy link

kvc0 commented Apr 8, 2021

Came here to make this issue: This is also a consideration for existing tables with a lot of shards. Doing the compression with 1 cursor per hypertable leaves the server bored while there are months of static segments waiting to be compressed.

@gpernelle
Copy link

Would love to see it implemented as well. Compressing and decompressing faster would be really helpful, especially when it's so easy to spin up a large machine for that job and then downsizing again.

@fasmit
Copy link

fasmit commented Jan 16, 2022

We're running in the same problem. Initial tests with real world ingestion of one data domain, we set a relatively large chunk size of 2 hours, importing 150k rows/sec or ~50MB/sec for chunks of ~300GB. The compression of such a chunk takes 3.5 hours (~25MB/s). It's not possible to do so in parallel with two sessions either, as noted above.

This means that the compression will never be able to keep up with live imports, and thus necessarily will fall behind more and more, rendering it effectively useless for a real-world setting. We intend to ingest much more than this...

This is not just a "nice to have" feature request: without it, timescaledb compression is essentially unusable (for us). (... and if your data is much smaller, who needs compression to begin with?)

@0xgeert
Copy link

0xgeert commented Feb 24, 2023

Why does this issues seem stale? As noted above, compression seems near useless for bigger tables as it stands now. This one feels like a priority to be honest.

@Wenqihai
Copy link

Wenqihai commented May 7, 2023

We're running in the same problem. after importing a large amount of history data, it takes a long time to compress a lot of chunks, because it's not possible to compress chunks in parallel, is there any good solutions now?

@jvanns
Copy link

jvanns commented Jul 3, 2023

Can't deny this would be an awfully useful feature to see implemented!

@ysmilda
Copy link

ysmilda commented Jul 4, 2023

For us this would also be a great feature to have!

@TheUbuntuGuy
Copy link

We have hit a limit on single thread performance on our servers. We have to rate limit the data ingress (by multiple times) to ensure the compression keeps up. It is a huge waste of modern high core-count CPUs.

@fasmit
Copy link

fasmit commented Oct 6, 2023

Fwiw, since this issue is so old, we moved over to Citus where the compression works fine.

@alexsanjoseph
Copy link
Author

Pretty sad that there has been no response after two years even with so many votes. Like ^, we have moved on to using Influx (which is not the best otherwise)

@asiayeah
Copy link

I can confirm the issue is still here (v2.13.1).

Some details are available in https://www.timescale.com/forum/t/timescale-only-uses-a-single-core-in-compression/2517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests