Performance drop up to x20 with compression on #2829

sasha-id · 2021-01-14T16:31:43Z

Relevant system information:

OS: Ubuntu 18.04
PostgreSQL version (output of postgres --version): 12.4
TimescaleDB version (output of \dx in psql): 2.0.0
Installation method: apt install

Describe the bug
Direct hypertable query: performance drops by ~ x10 when compression is enabled. Query on continuous aggregates: drops by ~x20.

To Reproduce
Schema:

    create table trades
    (
        time        timestamp not null,
        stock_id    bigint    not null,
        exchange_id integer,
        trade_id    bigint,
        tape        smallint,
        price       numeric(10, 4),
        size        integer
    );
      SELECT create_hypertable('trades', 'time', chunk_time_interval => INTERVAL '1 day');
      -- Disable for now, till it fixed
      -- ALTER TABLE trades SET (
      --   timescaledb.compress,
      --   timescaledb.compress_orderby = 'time DESC',
      --   timescaledb.compress_segmentby = 'stock_id'
      -- );
      -- SELECT add_compression_policy('trades', INTERVAL '7 days');


      --   Continuous aggregates
      CREATE MATERIALIZED VIEW candles_1m
      WITH (timescaledb.continuous) AS
      SELECT
          time_bucket('1 minute', time) AS time,
          last(time, time) AS last_trade,
          stock_id,
          count(*) trades,
          first(price, time) open,
          MAX(price) high,
          MIN(price) low,
          last(price, time) AS close,
          SUM(size) volume
      FROM trades
      GROUP BY time_bucket('1 minute', time), stock_id;
      SELECT add_continuous_aggregate_policy('candles_1m',
        start_offset => INTERVAL '1 day',
        end_offset => INTERVAL '2 minutes',
        schedule_interval => INTERVAL '1 minute'
      );

Sample data: https://www.dropbox.com/s/6l4ldo9vg1e5e39/data.csv.bz2?dl=0
psql -d new_db -c "\COPY conditions FROM data.csv CSV"

Query:

explain analyze SELECT
          time_bucket('1 minute', time) AS time,
          count(*) trades,
          first(price, time) open,
          MAX(price) high,
          MIN(price) low,
          last(price, time) AS close,
          SUM(size) volume
      FROM trades WHERE time >= '2020-12-31 11:38' AND time < '2020-12-31 11:39'
      GROUP BY time_bucket('1 minute', time), stock_id;

Query plan w/o compression:

 GroupAggregate  (cost=26456.41..27103.55 rows=2669 width=160) (actual time=140.384..274.565 rows=1942 loops=1)
   Group Key: (time_bucket('00:01:00'::interval, _hyper_1_1_chunk."time")), _hyper_1_1_chunk.stock_id
   ->  Sort  (cost=26456.41..26523.12 rows=26686 width=34) (actual time=140.249..205.019 rows=26301 loops=1)
         Sort Key: (time_bucket('00:01:00'::interval, _hyper_1_1_chunk."time")), _hyper_1_1_chunk.stock_id
         Sort Method: quicksort  Memory: 2823kB
         ->  Index Scan using _hyper_1_1_chunk_trades_time_idx on _hyper_1_1_chunk  (cost=0.43..24494.48 rows=26686 width=34) (actual time=0.081..71.490 rows=26301 loops=1)
               Index Cond: (("time" >= '2020-12-31 11:38:00'::timestamp without time zone) AND ("time" < '2020-12-31 11:39:00'::timestamp without time zone))
 Planning Time: 0.571 ms
 Execution Time: 279.394 ms

Query plan w compression:

 Finalize GroupAggregate  (cost=499803.54..578968.95 rows=2669 width=160) (actual time=398.114..493.005 rows=1942 loops=1)
   Group Key: (time_bucket('00:01:00'::interval, _hyper_1_1_chunk."time")), _hyper_1_1_chunk.stock_id
   ->  Gather Merge  (cost=499803.54..578855.51 rows=2669 width=160) (actual time=397.819..482.142 rows=1943 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Partial GroupAggregate  (cost=498803.53..577555.24 rows=2669 width=160) (actual time=385.002..452.221 rows=972 loops=2)
               Group Key: (time_bucket('00:01:00'::interval, _hyper_1_1_chunk."time")), _hyper_1_1_chunk.stock_id
               ->  Sort  (cost=498803.53..507548.53 rows=3498000 width=34) (actual time=384.824..417.302 rows=13150 loops=2)
                     Sort Key: (time_bucket('00:01:00'::interval, _hyper_1_1_chunk."time")), _hyper_1_1_chunk.stock_id
                     Sort Method: quicksort  Memory: 1304kB
                     Worker 0:  Sort Method: quicksort  Memory: 1520kB
                     ->  Custom Scan (DecompressChunk) on _hyper_1_1_chunk  (cost=0.16..9288.19 rows=3498000 width=34) (actual time=215.162..351.211 rows=13150 loops=2)
                           Filter: (("time" >= '2020-12-31 11:38:00'::timestamp without time zone) AND ("time" < '2020-12-31 11:39:00'::timestamp without time zone))
                           Rows Removed by Filter: 1300744
                           ->  Parallel Seq Scan on compress_hyper_5_3_chunk  (cost=0.00..543.19 rows=3498 width=124) (actual time=214.981..220.760 rows=1958 loops=2)
                                 Filter: ((_ts_meta_max_1 >= '2020-12-31 11:38:00'::timestamp without time zone) AND (_ts_meta_min_1 < '2020-12-31 11:39:00'::timestamp without time zone))
                                 Rows Removed by Filter: 5532
 Planning Time: 3.477 ms
 JIT:
   Functions: 27
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 5.141 ms, Inlining 62.861 ms, Optimization 206.136 ms, Emission 160.349 ms, Total 434.488 ms
 Execution Time: 523.742 ms
(23 rows)

The text was updated successfully, but these errors were encountered:

phemmer · 2021-01-15T16:18:58Z

Not sure they're related, but there are a few open issues on the query planner switching to sequential scans instead of index scans when handling compressed chunks: #2722 #2744

sasha-id · 2021-01-15T17:12:57Z

Not sure they're related, but there are a few open issues on the query planner switching to sequential scans instead of index scans when handling compressed chunks: #2722 #2744

Yes, must be the same issue, I provided sample data to help replicate the issue

svenklemm · 2021-04-10T23:54:08Z

Can you add the command you used to setup compression.

akuzm · 2021-10-27T16:10:32Z

Unfortunately we cannot reproduce this issue w/o knowing the exact command that was used to setup compression.

NunoFilipeSantos · 2021-11-18T14:33:01Z

@alex-klepa can you provide a way to reproduce it on 2.5.0? Otherwise, we need to close this issue.

gayyappan · 2021-11-19T20:59:47Z

@NunoFilipeSantos The script has the compression parameters. alex-klepa Thank you for the report and data to reproduce the problem. We are investigating the performance issues.

jjuraszek · 2023-02-24T11:33:01Z

I've hit the same wall. My solution which works is to turn off JIT

genedavis · 2023-03-05T23:08:11Z

Hi Team - is there any update on this ticket? I just encountered it with presumably a similar scenario:

Have a large hypertable to look up iot-based data (tag, time, value, etc)
Create a covering index starting with tag since that is a common initial lookup filter
Run a query collecting data for a window of the past few hours of data for groups of tags

This was all working very well, I believe until I added a compression rule for data older than 30 days. Now I'm getting a scan of the full hyperchunk based on time, instead of using the index. Queries for some groups of tags have gone from 10ms up to now 800ms.

Thanks - Gene

Edit: my workaround was to decompress all chunks and set compress to false on this table, which worked. However I would have hoped (I'm sure it's not easy) that the query planner would have been able to see that the query was bounded way outside of the compression time window and wouldn't have caused me to have to do this?

radresian · 2023-03-16T09:42:32Z

Hello Team,

I am planning to use timescale cloud for my crypto exchange company in order to keep historical asset prices, i did a small investigation about the performance and compression and came to this issue. Everything timescale offers good excellent but I am frustrated about seeing this issue is open.

Does that mean timescale can not offer compression and performance at the same time ? wont you do anything about this issue or suggest anything to your users ?

phemmer · 2023-03-16T12:39:09Z

Does that mean timescale can not offer compression and performance at the same time ?

Writing to a compressed chunk is the exception, not the norm. Typically you should be writing to uncompressed chunks, and the chunks get compressed once they've rotated out and are not being written to. Inserting into compressed chunks should generally be reserved for situations where a data source was offline, and dumps its data once coming back online after the chunk has been compressed.

radresian · 2023-03-17T21:19:02Z

@phemmer i have investigated the sample data in the example after your comment and realized that query is between '2020-12-31 11:38' and '2020-12-31 11:39' which is old data and in a compressed chunk. I thing querying the uncompressed chunk which includes new data would give a much better performance.

phemmer · 2023-03-17T23:01:25Z

Sorry, I completely misunderstood your comment. For some reason I thought you were talking about inserts. But now that I read it again, I don't know where I got that from. So ignore what I said, it's not applicable :-)

svenklemm added the waiting-for-author label Apr 10, 2021

NunoFilipeSantos added compression performance and removed waiting-for-author labels Oct 22, 2021

akuzm added the waiting-for-author label Oct 27, 2021

gayyappan added continuous_aggregate and removed waiting-for-author labels Nov 19, 2021

genedavis mentioned this issue Mar 8, 2023

[Bug]: Secondary index is ignored for joins involving compressed tables if there is a mismatch in where clause column and value types #5413

Open

mkindahl removed the continuous_aggregate label Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance drop up to x20 with compression on #2829

Performance drop up to x20 with compression on #2829

sasha-id commented Jan 14, 2021

phemmer commented Jan 15, 2021

sasha-id commented Jan 15, 2021

svenklemm commented Apr 10, 2021

akuzm commented Oct 27, 2021

NunoFilipeSantos commented Nov 18, 2021

gayyappan commented Nov 19, 2021 •

edited

Loading

jjuraszek commented Feb 24, 2023

genedavis commented Mar 5, 2023 •

edited

Loading

radresian commented Mar 16, 2023

phemmer commented Mar 16, 2023

radresian commented Mar 17, 2023

phemmer commented Mar 17, 2023

Performance drop up to x20 with compression on #2829

Performance drop up to x20 with compression on #2829

Comments

sasha-id commented Jan 14, 2021

phemmer commented Jan 15, 2021

sasha-id commented Jan 15, 2021

svenklemm commented Apr 10, 2021

akuzm commented Oct 27, 2021

NunoFilipeSantos commented Nov 18, 2021

gayyappan commented Nov 19, 2021 • edited Loading

jjuraszek commented Feb 24, 2023

genedavis commented Mar 5, 2023 • edited Loading

radresian commented Mar 16, 2023

phemmer commented Mar 16, 2023

radresian commented Mar 17, 2023

phemmer commented Mar 17, 2023

gayyappan commented Nov 19, 2021 •

edited

Loading

genedavis commented Mar 5, 2023 •

edited

Loading