Fix INSERT query performance on compressed chunk #6061

sb230132 · 2023-09-12T07:53:50Z

Fix INSERT query performs on compressed chunk

The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.

Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.

Fixes #6063
Fixes #5801

github-actions · 2023-09-12T07:54:12Z

@akuzm, @jnidzwetzki: please review this pull request.

Powered by pull-review

sb230132 · 2023-09-12T08:48:07Z

Changes in compression_conflicts_iso are expected. When i ran the same test on postgres without hypertables, i see same results.
pg_ddl_iso.sql.zip
pg_ddl_iso.out.zip

codecov · 2023-09-12T08:49:42Z

Codecov Report

Merging #6061 (f1dae93) into main (93519d0) will decrease coverage by 0.06%.
The diff coverage is 93.60%.

@@            Coverage Diff             @@
##             main    #6061      +/-   ##
==========================================
- Coverage   81.38%   81.33%   -0.06%     
==========================================
  Files         243      243              
  Lines       55948    56005      +57     
  Branches    12389    12393       +4     
==========================================
+ Hits        45536    45553      +17     
- Misses       8092     8105      +13     
- Partials     2320     2347      +27

Files Changed	Coverage Δ
tsl/src/compression/compression.h	`40.00% <ø> (ø)`
tsl/src/compression/compression.c	`90.75% <93.38%> (-0.75%)`	⬇️
tsl/src/compression/api.c	`90.78% <100.00%> (-0.16%)`	⬇️

... and 17 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jnidzwetzki · 2023-09-12T10:22:38Z

tsl/src/compression/compression.c

+				if (!heap_attisnull(compressed_tuple, attno, decompressor.in_desc))
+				{
+					valid = false;
+					break;


Could you add a test for this line?

tsl/test/sql/compression_conflicts.sql has test which will cover these lines.

It seems Codecov still does not see any coverage here. Which SQL statement from the test should cover these lines?

jnidzwetzki · 2023-09-12T10:30:00Z

@sb230132 If I understood the changes in the test output correctly, this PR is not only about performance - it also fixes wrong query results. I suggest adding this fact to the commit message.

jnidzwetzki · 2023-09-12T10:31:57Z

tsl/src/compression/compression.c

+	 * locking the compressed chunk here
+	 */
+	table_close(compressed_chunk_rel, NoLock);
+	table_close(uncompressed_chunk_rel, NoLock);


Is it intended to keep the lock of the uncompressed_chunk_rel too? If so, could you add a comment why this is required?

There is a comment just above.

This comment only covers the compressed_chunk_rel. If the same applies to the uncompressed_chunk_rel it should be added to the comment.

jnidzwetzki · 2023-09-12T10:38:29Z

tsl/src/compression/compression.h

+/* build scankeys for segmentby columns */
+#define SEGMENTBY_KEYS (1 << 1)
+/* build scankeys for orderby columns */
+#define ORDERBY_KEYS (1 << 2)


Do you need this define? It seems it is never used in the code.

Removed ORDERBY_KEYS as it is not used anywhere

jnidzwetzki · 2023-09-12T10:43:42Z

tsl/src/compression/compression.c

+			write_logical_replication_msg_decompression_end();
+
+			TM_FailureData tmfd;
+			TM_Result result pg_attribute_unused();


This code was already present in the old code. Could you please add a comment to explain why it is safe to ignore the result of the operation?

Added comment.

It seems you added also an assert. My question was more why is it safe to ignore the result of the operation? Maybe you could add this information to the comment.

akuzm · 2023-09-12T14:04:50Z

tsl/src/compression/api.c

@@ -1017,6 +1017,7 @@ tsl_get_compressed_chunk_index_for_recompression(PG_FUNCTION_ARGS)
 {
 	Oid uncompressed_chunk_id = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
 	Chunk *uncompressed_chunk = ts_chunk_get_by_relid(uncompressed_chunk_id, true);
+	Chunk *compressed_chunk = ts_chunk_get_by_id(uncompressed_chunk->fd.compressed_chunk_id, true);


This lookup is expensive, is it enough to get the relid here? There's a less heavy function ts_chunk_get_relid for that, or maybe ts_chunk_get_formdata.

Same comment for another usage below.

Fixed as you suggested. Help me understand why ts_chunk_get_by_id is a costly operation ?

ts_chunk_get_by_id does following:

Build scankey on chunk id.

Take AccessShareLock on CHUNK catalog table.

Scan the index and fetch the required row.

What you suggested is:

Call ts_chunk_get_relid to get chunk relid.

Call ts_chunk_get_by_relid to get the chunk from relid.

ts_chunk_get_by_relid does following:

Build scankey on schem, chunk name.

Take AccessShareLock on CHUNK catalog table.

Scan the index and fetch the required row.

I mean, don't build the entire Chunk struct, you only need its relid to open the relation, right? Building Chunk struct is costly because it does a lot of catalog lookups to find the matching dimensions, constraints and so on, and probably you don't need them here. In planning time, we always cache the Chunk struct to save on these lookups, but here in execution time it's not accessible.

akuzm · 2023-09-12T14:08:44Z

tsl/src/compression/compression.c

+		if (attoff < indnkeyatts - 1)
+		{
+			/* Initialize segmentby scankeys. */
+			if (key_flags & SEGMENTBY_KEYS)
+			{
+				/* get index attribute name */
+				Form_pg_attribute attr = TupleDescAttr(idxrel->rd_att, attoff);


Can we end up with index scan with no keys here? Probably better to keep the seq scan for this case, because a full index scan w/o keys is just needlessly slower.

I don't understand what you mean here ?
By default there is always an index with segmentby columns on compressed chunk.
Index Oid is fetched from get_compressed_chunk_index() which will always check for presence of segmentby columns in the index, else this function will return InvalidOid.

I mean, can key_flags & SEGMENTBY_KEYS be false? That would mean we're using an index scan, but without any index conditions.

The INSERT query with ON CONFLICT on compressed chunk does a heapscan to verify unique constraint violation. This patch improves the performance by doing an indexscan on compressed chunk, to fetch matching records based on segmentby columns. These matching records are inserted into uncompressed chunk, then unique constraint violation is verified. Since index on compressed chunk has only segmentby columns, we cannot do a point lookup considering orderby columns as well. This patch thus will result in decompressing matching records only based on segmentby columns. Fixes #6063 Fixes #5801

svenklemm

This seems to not handle attribute numbers correctly and use them directly from the hypertable instead of the chunks. In addition this seems to completely change the operator handling and always use opfamily operators without checking for compatibility.

horzsolt · 2023-10-03T13:45:43Z

Closing this one, @antekresic is working on some other PRs which will replace this one.

mgagliardo91 · 2023-10-03T14:41:15Z

@horzsolt @antekresic would you mind posting a link to the new PRs when they are created?

mblsf · 2023-10-09T07:29:04Z

@horzsolt @antekresicv I would also like to know these PRs

svenklemm · 2023-10-09T07:37:37Z

@mgagliardo91 @mblsf some fixes already went into 2.12.0 with #6081 but there are more followup PRs coming

mgagliardo91 · 2023-10-18T16:26:25Z

@horzsolt @antekresic is there another PR we can monitor for the followup here?

sb230132 self-assigned this Sep 12, 2023

github-actions bot requested review from akuzm and jnidzwetzki September 12, 2023 07:54

sb230132 force-pushed the fix_slow_inserts branch 2 times, most recently from 9e72c3f to e127cc8 Compare September 12, 2023 08:37

sb230132 mentioned this pull request Sep 12, 2023

Slow insert on compressed chunks #6063

Closed

sb230132 changed the title ~~Fix INSERT query performs on compressed chunk~~ Fix INSERT query performance on compressed chunk Sep 12, 2023

jnidzwetzki reviewed Sep 12, 2023

View reviewed changes

akuzm reviewed Sep 12, 2023

View reviewed changes

sb230132 force-pushed the fix_slow_inserts branch from e127cc8 to c42f061 Compare September 12, 2023 14:05

akuzm reviewed Sep 12, 2023

View reviewed changes

sb230132 requested a review from jnidzwetzki September 12, 2023 14:23

sb230132 force-pushed the fix_slow_inserts branch from c42f061 to f1dae93 Compare September 13, 2023 03:24

sb230132 requested a review from akuzm September 13, 2023 03:25

jnidzwetzki mentioned this pull request Sep 15, 2023

[Bug]: Crash in ON CONFLICT ... DO UPDATE query #6076

Closed

svenklemm requested changes Sep 16, 2023

View reviewed changes

horzsolt assigned antekresic and unassigned sb230132 Sep 29, 2023

mkindahl assigned antekresic and unassigned antekresic Sep 29, 2023

horzsolt closed this Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix INSERT query performance on compressed chunk #6061

Fix INSERT query performance on compressed chunk #6061

sb230132 commented Sep 12, 2023 •

edited

Loading

github-actions bot commented Sep 12, 2023

sb230132 commented Sep 12, 2023

codecov bot commented Sep 12, 2023 •

edited

Loading

jnidzwetzki Sep 12, 2023

sb230132 Sep 12, 2023

jnidzwetzki Sep 13, 2023

jnidzwetzki commented Sep 12, 2023

jnidzwetzki Sep 12, 2023

sb230132 Sep 12, 2023

jnidzwetzki Sep 13, 2023

jnidzwetzki Sep 12, 2023

sb230132 Sep 12, 2023

jnidzwetzki Sep 12, 2023

sb230132 Sep 12, 2023

jnidzwetzki Sep 13, 2023

akuzm Sep 12, 2023

sb230132 Sep 13, 2023

akuzm Sep 13, 2023

akuzm Sep 12, 2023

sb230132 Sep 12, 2023

akuzm Sep 12, 2023

sb230132 Sep 13, 2023

svenklemm left a comment

horzsolt commented Oct 3, 2023

mgagliardo91 commented Oct 3, 2023

mblsf commented Oct 9, 2023

svenklemm commented Oct 9, 2023

mgagliardo91 commented Oct 18, 2023

Fix INSERT query performance on compressed chunk #6061

Fix INSERT query performance on compressed chunk #6061

Conversation

sb230132 commented Sep 12, 2023 • edited Loading

github-actions bot commented Sep 12, 2023

sb230132 commented Sep 12, 2023

codecov bot commented Sep 12, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnidzwetzki commented Sep 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svenklemm left a comment

Choose a reason for hiding this comment

horzsolt commented Oct 3, 2023

mgagliardo91 commented Oct 3, 2023

mblsf commented Oct 9, 2023

svenklemm commented Oct 9, 2023

mgagliardo91 commented Oct 18, 2023

sb230132 commented Sep 12, 2023 •

edited

Loading

codecov bot commented Sep 12, 2023 •

edited

Loading