-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix INSERT query performance on compressed chunk #6061
Conversation
@akuzm, @jnidzwetzki: please review this pull request.
|
9e72c3f
to
e127cc8
Compare
Changes in compression_conflicts_iso are expected. When i ran the same test on postgres without hypertables, i see same results. |
Codecov Report
@@ Coverage Diff @@
## main #6061 +/- ##
==========================================
- Coverage 81.38% 81.33% -0.06%
==========================================
Files 243 243
Lines 55948 56005 +57
Branches 12389 12393 +4
==========================================
+ Hits 45536 45553 +17
- Misses 8092 8105 +13
- Partials 2320 2347 +27
... and 17 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
if (!heap_attisnull(compressed_tuple, attno, decompressor.in_desc)) | ||
{ | ||
valid = false; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsl/test/sql/compression_conflicts.sql has test which will cover these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Codecov still does not see any coverage here. Which SQL statement from the test should cover these lines?
@sb230132 If I understood the changes in the test output correctly, this PR is not only about performance - it also fixes wrong query results. I suggest adding this fact to the commit message. |
* locking the compressed chunk here | ||
*/ | ||
table_close(compressed_chunk_rel, NoLock); | ||
table_close(uncompressed_chunk_rel, NoLock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to keep the lock of the uncompressed_chunk_rel
too? If so, could you add a comment why this is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment just above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment only covers the compressed_chunk_rel
. If the same applies to the uncompressed_chunk_rel
it should be added to the comment.
tsl/src/compression/compression.h
Outdated
/* build scankeys for segmentby columns */ | ||
#define SEGMENTBY_KEYS (1 << 1) | ||
/* build scankeys for orderby columns */ | ||
#define ORDERBY_KEYS (1 << 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this define? It seems it is never used in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed ORDERBY_KEYS as it is not used anywhere
write_logical_replication_msg_decompression_end(); | ||
|
||
TM_FailureData tmfd; | ||
TM_Result result pg_attribute_unused(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was already present in the old code. Could you please add a comment to explain why it is safe to ignore the result of the operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you added also an assert
. My question was more why is it safe to ignore the result of the operation? Maybe you could add this information to the comment.
tsl/src/compression/api.c
Outdated
@@ -1017,6 +1017,7 @@ tsl_get_compressed_chunk_index_for_recompression(PG_FUNCTION_ARGS) | |||
{ | |||
Oid uncompressed_chunk_id = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0); | |||
Chunk *uncompressed_chunk = ts_chunk_get_by_relid(uncompressed_chunk_id, true); | |||
Chunk *compressed_chunk = ts_chunk_get_by_id(uncompressed_chunk->fd.compressed_chunk_id, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lookup is expensive, is it enough to get the relid here? There's a less heavy function ts_chunk_get_relid
for that, or maybe ts_chunk_get_formdata
.
Same comment for another usage below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as you suggested. Help me understand why ts_chunk_get_by_id
is a costly operation ?
ts_chunk_get_by_id
does following:
- Build scankey on chunk id.
- Take AccessShareLock on CHUNK catalog table.
- Scan the index and fetch the required row.
What you suggested is:
- Call
ts_chunk_get_relid
to get chunk relid. - Call
ts_chunk_get_by_relid
to get the chunk from relid.
ts_chunk_get_by_relid
does following:
- Build scankey on schem, chunk name.
- Take AccessShareLock on CHUNK catalog table.
- Scan the index and fetch the required row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, don't build the entire Chunk struct, you only need its relid to open the relation, right? Building Chunk struct is costly because it does a lot of catalog lookups to find the matching dimensions, constraints and so on, and probably you don't need them here. In planning time, we always cache the Chunk struct to save on these lookups, but here in execution time it's not accessible.
e127cc8
to
c42f061
Compare
tsl/src/compression/compression.c
Outdated
if (attoff < indnkeyatts - 1) | ||
{ | ||
/* Initialize segmentby scankeys. */ | ||
if (key_flags & SEGMENTBY_KEYS) | ||
{ | ||
/* get index attribute name */ | ||
Form_pg_attribute attr = TupleDescAttr(idxrel->rd_att, attoff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we end up with index scan with no keys here? Probably better to keep the seq scan for this case, because a full index scan w/o keys is just needlessly slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean here ?
By default there is always an index with segmentby columns on compressed chunk.
Index Oid is fetched from get_compressed_chunk_index() which will always check for presence of segmentby columns in the index, else this function will return InvalidOid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, can key_flags & SEGMENTBY_KEYS
be false? That would mean we're using an index scan, but without any index conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
The INSERT query with ON CONFLICT on compressed chunk does a heapscan to verify unique constraint violation. This patch improves the performance by doing an indexscan on compressed chunk, to fetch matching records based on segmentby columns. These matching records are inserted into uncompressed chunk, then unique constraint violation is verified. Since index on compressed chunk has only segmentby columns, we cannot do a point lookup considering orderby columns as well. This patch thus will result in decompressing matching records only based on segmentby columns. Fixes #6063 Fixes #5801
c42f061
to
f1dae93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to not handle attribute numbers correctly and use them directly from the hypertable instead of the chunks. In addition this seems to completely change the operator handling and always use opfamily operators without checking for compatibility.
Closing this one, @antekresic is working on some other PRs which will replace this one. |
@horzsolt @antekresic would you mind posting a link to the new PRs when they are created? |
@horzsolt @antekresicv I would also like to know these PRs |
@mgagliardo91 @mblsf some fixes already went into 2.12.0 with #6081 but there are more followup PRs coming |
@horzsolt @antekresic is there another PR we can monitor for the followup here? |
Fix INSERT query performs on compressed chunk
The INSERT query with ON CONFLICT on compressed chunk does a
heapscan to verify unique constraint violation. This patch
improves the performance by doing an indexscan on compressed
chunk, to fetch matching records based on segmentby columns.
These matching records are inserted into uncompressed chunk,
then unique constraint violation is verified.
Since index on compressed chunk has only segmentby columns,
we cannot do a point lookup considering orderby columns as well.
This patch thus will result in decompressing matching
records only based on segmentby columns.
Fixes #6063
Fixes #5801