Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow insert on compressed chunks #6063

Closed
sb230132 opened this issue Sep 12, 2023 · 3 comments
Closed

Slow insert on compressed chunks #6063

sb230132 opened this issue Sep 12, 2023 · 3 comments
Labels

Comments

@sb230132
Copy link
Contributor

sb230132 commented Sep 12, 2023

What type of bug is this?

Performance issue

What subsystems and features are affected?

Compression

What happened?

The INSERT query with ON CONFLICT on compressed chunk is very slow.

TimescaleDB version affected

2.12

PostgreSQL version used

15.3

What operating system did you use?

MacOSX

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

tsdb=> explain analyze insert into testdata (timestamp, idSite, code, instance, valuefloat) values (1689336100, 139400, 'V', 0, 3.3) on conflict do nothing;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..0.01 rows=1 width=152) (actual time=176.485..176.487 rows=0 loops=1)
   ->  Insert on testdata  (cost=0.00..0.01 rows=1 width=152) (actual time=176.485..176.487 rows=0 loops=1)
         Conflict Resolution: NOTHING
         Tuples Inserted: 1
         Conflicting Tuples: 0
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..0.01 rows=1 width=152) (actual time=176.448..176.450 rows=1 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=152) (actual time=0.002..0.003 rows=1 loops=1)
 Planning Time: 0.144 ms
 Execution Time: 176.583 ms
(9 rows)

tsdb=> explain analyze insert into testdata (timestamp, idSite, code, instance, valuefloat) values (1689336100, 139400, 'V', 0, 3.3) on conflict do nothing;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..0.01 rows=1 width=152) (actual time=165.338..165.339 rows=0 loops=1)
   ->  Insert on testdata  (cost=0.00..0.01 rows=1 width=152) (actual time=165.338..165.339 rows=0 loops=1)
         Conflict Resolution: NOTHING
         Tuples Inserted: 1
         Conflicting Tuples: 0
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..0.01 rows=1 width=152) (actual time=165.298..165.300 rows=1 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=152) (actual time=0.002..0.002 rows=1 loops=1)
 Planning Time: 0.127 ms
 Execution Time: 165.434 ms
(9 rows)


### How can we reproduce the bug?

```bash
drop database db;
create database db;
\c db
create extension timescaledb;
create table testdata
(
  timestamp bigint not null,
  idSite integer null,
  code text not null,
  instance smallint not null,
  venusVersion text,
  productId text,
  valueFloat float,
  valueString text,
  valueEnum smallint
);

CREATE OR REPLACE FUNCTION unix_now() returns BIGINT LANGUAGE SQL STABLE as $$ SELECT extract(epoch from now())::BIGINT $$;
select create_hypertable('testdata', 'timestamp', chunk_time_interval=>3600, create_default_indexes=>False);
select set_integer_now_func('testdata', 'unix_now');

create unique index uniqueindex on testdata (idSite, code, instance, timestamp desc);
alter table testdata set (timescaledb.compress, timescaledb.compress_segmentby = 'idSite,code,instance', timescaledb.compress_orderby='timestamp desc');
select add_compression_policy('testdata', 14400);
copy testdata(timestamp,idsite,code,instance,venusversion,productid,valuefloat,valuestring,valueenum) FROM '/Users/bharathysatish/Downloads/testdata.csv' DELIMITER ',' CSV HEADER;
call run_job (1000);
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) insert into testdata (timestamp, idSite, code, instance, valuefloat) values (1689336100, 139400, 'V', 0, 3.3) on conflict do nothing;
[testdata.csv.zip](https://github.com/timescale/timescaledb/files/12584252/testdata.csv.zip)

@sb230132 sb230132 added the bug label Sep 12, 2023
@sb230132 sb230132 self-assigned this Sep 12, 2023
sb230132 added a commit that referenced this issue Sep 12, 2023
The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.

Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.

Fixes #6063
sb230132 added a commit that referenced this issue Sep 12, 2023
The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.

Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.

Fixes #6063
@sb230132
Copy link
Contributor Author

With patch from #6061 INSERTS are much faster.

db=# begin; EXPLAIN (ANALYZE, BUFFERS, SETTINGS) insert into testdata (timestamp, idSite, code, instance, valuefloat) values (1689336100, 139400, 'V', 0, 3.3) on conflict do nothing;rollback;
BEGIN
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..0.01 rows=1 width=152) (actual time=1.228..1.231 rows=0 loops=1)
   Buffers: shared hit=56
   ->  Insert on testdata  (cost=0.00..0.01 rows=1 width=152) (actual time=1.228..1.231 rows=0 loops=1)
         Conflict Resolution: NOTHING
         Tuples Inserted: 1
         Conflicting Tuples: 0
         Buffers: shared hit=56
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..0.01 rows=1 width=152) (actual time=1.160..1.163 rows=1 loops=1)
               Buffers: shared hit=51
               ->  Result  (cost=0.00..0.01 rows=1 width=152) (actual time=0.008..0.008 rows=1 loops=1)
 Planning:
   Buffers: shared hit=13
 Planning Time: 0.399 ms
 Execution Time: 1.547 ms
(14 rows)

ROLLBACK

sb230132 added a commit that referenced this issue Sep 12, 2023
The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.

Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.

Fixes #6063
Fixes #5801
sb230132 added a commit that referenced this issue Sep 13, 2023
The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.

Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.

Fixes #6063
Fixes #5801
@corkscruff
Copy link

Tagging @wiebeytec for awareness

@antekresic
Copy link
Contributor

This was optimized with the recent releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants