Fix a deadlock when decompressing chunks and performing SELECTs #4676

jnidzwetzki · 2022-09-06T14:43:56Z

This patch fixes a deadlock between chunk decompression and SELECT queries
executed in parallel. The change in
a608d7d requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.

In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.

Fixes #4605
Fixes #2565

codecov · 2022-09-06T14:53:59Z

Codecov Report

Merging #4676 (53aed0d) into main (5600fc0) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #4676      +/-   ##
==========================================
- Coverage   90.92%   90.89%   -0.03%     
==========================================
  Files         224      224              
  Lines       42406    42407       +1     
==========================================
- Hits        38556    38545      -11     
- Misses       3850     3862      +12

Impacted Files	Coverage Δ
tsl/src/compression/api.c	`95.70% <100.00%> (ø)`
tsl/src/compression/compression.c	`95.26% <100.00%> (+0.01%)`	⬆️
src/bgw/scheduler.c	`81.57% <0.00%> (-3.22%)`	⬇️
src/bgw/job.c	`94.41% <0.00%> (-0.24%)`	⬇️
tsl/src/bgw_policy/job.c	`88.11% <0.00%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5600fc0...53aed0d. Read the comment docs.

akuzm · 2022-09-06T17:16:47Z

tsl/src/compression/api.c

-	/* Prevent readers from using the compressed chunk that is going to be deleted */
-	LockRelationOid(uncompressed_chunk->table_id, AccessExclusiveLock);


Huh, so it was actually a typo -- I never noticed the discrepancy with the comment.

akuzm · 2022-09-06T17:19:34Z

tsl/src/compression/api.c

+	 * Prevents readers from using the compressed chunk that is going to be
+	 * deleted. Calling performMultipleDeletions in chunk_index_tuple_delete
+	 * also requests an AccessExclusiveLock. However, this call makes the
+	 * lock on the chunk explicit.


This change not only makes the lock explicit, but also moves it earlier, before ts_compression_chunk_size_delete. Do we have a test case that breaks if we remove the explicit lock altogether?

At the first glance, nothing should break because w/o the lock the concurrent SELECTs just stop seeing the compressed chunk during planning, earlier than it is actually dropped.

I am not able to create a failing test for this case. However, I am in favor of making the lock explicit because also PostgreSQL requests an AccessExclusiveLock before performMultipleDeletions is invoked. So, I want to be compliant with PostgreSQL logic.

But this is not only about explicitness, this is changing the relative order of events -- e.g. whether we modify the chunks table holding the exclusive lock on compressed chunk or not. If you want just to make it analogous to posgres, the place to take the deletion lock would be just before the ts_chunk_drop call, or maybe inside the ts_chunk_drop_internal call, or just relying on the performDelete doing the same thing.

If we explicitly want to lock it before modifying the catalog, I think we need some justification for this. I mean, if we have another problem in this place, someone will look at this lock again and think: why is it here and not there?

Given that the catalog modification and dropping the chunk happen inside a transaction, they are both going to become visible atomically, anyway, no matter where we put this lock.

@AKZUM Moving the lock just before ts_chunk_drop is a good point to ensure the order of events is not changed.

Are you generally in favor of performing this lock or do you argue that it is not needed because we can rely on performDelete?

the order of events is not changed

Of which events exactly?

I think the locking performed by performDelete is enough for correctness. The reason I'm asking is that I don't understand the rationale for 1) taking this lock explicitly (well, maybe documentation and following the general postgres line of thought)
2) most importantly, the rationale for doing it at this particular line and not in ts_chunk_drop_internal like point (1) would suggest. If we are trying to prevent some unwanted sequence of events, what is this sequence exactly? I think it's important to have such a description in the future when we are going to modify this code, so that we are able to do it correctly.

Of which events exactly?

I mean, we probably should have a comment describing what the order is, and why it is important. At the first glance, there's no meaningful ordering of cleaning up the chunk record, the chunk size record and dropping the compressed relation -- to other transactions they happen atomically at the end of decompressing transaction.

gayyappan · 2022-09-07T13:28:47Z

tsl/src/compression/api.c

+	 * (as done in PostgreSQL when tables are dropped,
+	 * see RemoveRelations).
+	 */
+	LockRelationOid(compressed_chunk->table_id, AccessExclusiveLock);


could you elaborate why this is needed ? doesn't ts_chunk_drop (or a function called by it )acquire the AccessExclusiveLock before dropping the table?

My understanding is that no AccessExclusiveLock on the compressed chunk is requested in our existing code before it is deleted.

I want to be compliant with the way PostgreSQL is performing locks before tables are deleted and performMultipleDeletions is called. Maybe I am too overcautious at this point and we can trust the internals of performMultipleDeletions. Then the lock would be superfluous.

Attached are the locks that are active before performMultipleDeletions is called in ts_chunk_index_delete_by_chunk_id and the AccessExclusiveLock is implicitly created on the chunk (_timescaledb_internal.compress_hyper_2_7296_chunk is uncompressed in this example).

SELECT relation::regclass as name, locktype, database, relation, pid, mode, granted, fastpath, waitstart FROM pg_locks WHERE relation::regclass::text LIKE '%chunk' ORDER BY relation, locktype, mode, granted; name | locktype | database | relation | pid | mode | granted | fastpath | waitstart ---------------------------------------------------+----------+----------+----------+---------+------------------+---------+----------+----------- _timescaledb_catalog.chunk | relation | 42698 | 42810 | 1113329 | RowExclusiveLock | t | t | _timescaledb_internal._hyper_1_1_chunk | relation | 42698 | 43372 | 1113329 | ExclusiveLock | t | f | _timescaledb_internal._hyper_1_1_chunk | relation | 42698 | 43372 | 1113329 | ShareLock | t | f | _timescaledb_internal.compress_hyper_2_7296_chunk | relation | 42698 | 83658 | 1113329 | ExclusiveLock | t | f | (4 rows)

timescaledb-2.9.0-dev.so!chunk_index_tuple_delete(TupleInfo * ti, void * data) (/home/jan/timescaledb/src/chunk_index.c:669) timescaledb-2.9.0-dev.so!ts_scanner_scan(ScannerCtx * ctx) (/home/jan/timescaledb/src/scanner.c:451) timescaledb-2.9.0-dev.so!chunk_index_scan(int indexid, ScanKeyData * scankey, int nkeys, tuple_found_func tuple_found, tuple_filter_func tuple_filter, void * data, LOCKMODE lockmode) (/home/jan/timescaledb/src/chunk_index.c:502) timescaledb-2.9.0-dev.so!ts_chunk_index_delete_by_chunk_id(int32 chunk_id, _Bool drop_index) (/home/jan/timescaledb/src/chunk_index.c:779) timescaledb-2.9.0-dev.so!chunk_tuple_delete(TupleInfo * ti, DropBehavior behavior, _Bool preserve_chunk_catalog_row) (/home/jan/timescaledb/src/chunk.c:2879) timescaledb-2.9.0-dev.so!chunk_delete(ScanIterator * iterator, DropBehavior behavior, _Bool preserve_chunk_catalog_row) (/home/jan/timescaledb/src/chunk.c:2954) timescaledb-2.9.0-dev.so!ts_chunk_delete_by_name_internal(const char * schema, const char * table, DropBehavior behavior, _Bool preserve_chunk_catalog_row) (/home/jan/timescaledb/src/chunk.c:2981) timescaledb-2.9.0-dev.so!ts_chunk_delete_by_relid(Oid relid, DropBehavior behavior, _Bool preserve_chunk_catalog_row) (/home/jan/timescaledb/src/chunk.c:3002) timescaledb-2.9.0-dev.so!ts_chunk_drop_internal(const Chunk * chunk, DropBehavior behavior, int32 log_level, _Bool preserve_catalog_row) (/home/jan/timescaledb/src/chunk.c:3669) timescaledb-2.9.0-dev.so!ts_chunk_drop(const Chunk * chunk, DropBehavior behavior, int32 log_level) (/home/jan/timescaledb/src/chunk.c:3678) timescaledb-tsl-2.9.0-dev.so!decompress_chunk_impl(Oid uncompressed_hypertable_relid, Oid uncompressed_chunk_relid, _Bool if_compressed) (/home/jan/timescaledb/tsl/src/compression/api.c:413) timescaledb-tsl-2.9.0-dev.so!tsl_decompress_chunk(FunctionCallInfo fcinfo) (/home/jan/timescaledb/tsl/src/compression/api.c:645) timescaledb-2.9.0-dev.so!ts_decompress_chunk(FunctionCallInfo fcinfo) (/home/jan/timescaledb/src/cross_module_fn.c:86) ExecInterpExpr(ExprState * state, ExprContext * econtext, _Bool * isnull) (/home/jan/postgresql-sandbox/src/REL_14_2/src/backend/executor/execExprInterp.c:749) ExecInterpExprStillValid(ExprState * state, ExprContext * econtext, _Bool * isNull) (/home/jan/postgresql-sandbox/src/REL_14_2/src/backend/executor/execExprInterp.c:1824) ExecEvalExprSwitchContext(ExprState * state, ExprContext * econtext, _Bool * isNull) (/home/jan/postgresql-sandbox/src/REL_14_2/src/include/executor/executor.h:339) ExecProject(ProjectionInfo * projInfo) (/home/jan/postgresql-sandbox/src/REL_14_2/src/include/executor/executor.h:373) ExecScan(ScanState * node, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd) (/home/jan/postgresql-sandbox/src/REL_14_2/src/backend/executor/execScan.c:238) ExecFunctionScan(PlanState * pstate) (/home/jan/postgresql-sandbox/src/REL_14_2/src/backend/executor/nodeFunctionscan.c:270) ExecProcNodeFirst(PlanState * node) (/home/jan/postgresql-sandbox/src/REL_14_2/src/backend/executor/execProcnode.c:463)

Won't this give spurious errors when trying to read compressed tables? If any readers reach this point, it would behave roughly like this, which does not make sense since the data is available in the uncompressed chunk corresponding to this compressed chunk.

What am I missing?

@mkindahl Thank you for creating the drawing. It illustrates the problem well. Before I write a longer reply, I would like to make sure that I understand your concern correctly.

Are you concerned that the AccessExclusiveLock should be taken later in the code so that readers can access the chunk as long as possible or

are you concerned about this situation in general (i.e., a reader that waits on a AccessShareLock of a chunk and the chunk no longer exists after the lock is granted)?

I'm concerned about the latter. In this situation, the reader should not even reach this point in the code.

If a decompression is ongoing, the compressed chunk will be decompressed and removed and the reader should be re-routed to the uncompressed chunk where the data will reside once the decompression is one. Then it can start reading from the chunk once the lock is granted. With this locking pattern, readers will experience errors if they are unlucky and race with the decompression job.

@mkindahl In the most common code path (when using a hypertable with at least one index), the readers are already blocked by the AccessExclusiveLock on the index caused by the reindex_relation call after decompressing the chunk. However, it's a good point to also consider hypertables without any index and grab the lock after the compressed chunk is removed from the catalog to route such reads properly. I changed the PR accordingly.

Are you generally in favor of requesting an explicit AccessExclusiveLock on the chunk before we delete it? I am not sure if this is really necessary. I introduced it because PostgreSQL explicitly requests an AccessExclusiveLock before a table is dropped and performMultipleDeletions is invoked, and I want to be consistent with the way PostgreSQL implements similar functionality. However, the preconditions for calling the performMultipleDeletions function don't seem to be explicitly defined. I am not sure what is the correct/best solution here. Maybe you have some advice for me.

Are you generally in favor of requesting an explicit AccessExclusiveLock on the chunk before we delete it? I am not sure if this is really necessary. I introduced it because PostgreSQL explicitly requests an AccessExclusiveLock before a table is dropped and performMultipleDeletions is invoked, and I want to be consistent with the way PostgreSQL implements similar functionality. However, the preconditions for calling the performMultipleDeletions function don't seem to be explicitly defined. I am not sure what is the correct/best solution here. Maybe you have some advice for me.

If that is a precondition for calling performMultipleDeletions then we should do it, and in general you need to lock the table that you're deleting, but this is not the same as the problem described above.

My comment above was more a result from the fact that you are trying to solve a deadlock and in the comment say that you're locking here to prevent readers from reading the chunk. If you change this and readers do not normally reach this path, you need to update the comment so that it is accurate.

@mkindahl Indeed, the comment was outdated and misleading. I have updated it in the current version of the PR.

The deadlock was introduced in a608d7d by requesting a lock for the uncompressed chunk instead of the compressed one (uncompressed_chunk->table_id instead of compressed_chunk->table_id). This is changed in this PR and solves the deadlock.

The PostgreSQL documentation does not mention whether such an AccessExclusiveLock is a prerequisite for calling performMultipleDeletions. But PostgreSQL explicitly requests such a lock before calling the function. I am therefore also in favor of explicitly requesting this lock.

mkindahl

I think this fix looks good, but our locking here is too complicated for our own good.

mkindahl · 2022-09-22T10:58:15Z

tsl/test/isolation/specs/decompression_chunk_and_parallel_query.in

+   -- All generated data is part of one chunk. Only one chunk is used because 'compress_chunk' is 
+   -- used in this isolation test. In contrast to 'policy_compression_execute' all decompression
+   -- operations are executed in one transaction. So, processing more than one chunk with 'compress_chunk'
+   -- could lead to deadlocks that are not occur real-world scenarios (due to locks hold on a completely 


Suggested change

-- could lead to deadlocks that are not occur real-world scenarios (due to locks hold on a completely

-- could lead to deadlocks that do not occur real-world scenarios (due to locks hold on a completely

mkindahl · 2022-09-22T11:01:20Z

tsl/test/isolation/specs/decompression_chunk_and_parallel_query.in

+   INSERT INTO sensor_data
+   SELECT
+   time + (INTERVAL '1 minute' * random()) AS time,
+   sensor_id,
+   random() AS cpu,
+   random()* 100 AS temperature
+   FROM
+   generate_series('2022-01-01', '2022-01-15', INTERVAL '1 minute') AS g1(time),
+   generate_series(1, 50, 1) AS g2(sensor_id)
+   ORDER BY time;


Nit: a little hard to read.

Suggested change

INSERT INTO sensor_data

SELECT

time + (INTERVAL '1 minute' * random()) AS time,

sensor_id,

random() AS cpu,

random()* 100 AS temperature

FROM

generate_series('2022-01-01', '2022-01-15', INTERVAL '1 minute') AS g1(time),

generate_series(1, 50, 1) AS g2(sensor_id)

ORDER BY time;

INSERT INTO sensor_data

SELECT "time" + (INTERVAL '1 minute' * random()) AS "time",

sensor_id,

random() AS cpu,

random()* 100 AS temperature

FROM generate_series('2022-01-01', '2022-01-15', INTERVAL '1 minute') AS g1("time"),

generate_series(1, 50, 1) AS g2(sensor_id)

ORDER BY "time";```

Thanks for the review. I addressed your comments in the current version of the PR.

This patch fixes a deadlock between chunk decompression and SELECT queries executed in parallel. The change in a608d7d requests an AccessExclusiveLock for the decompressed chunk instead of the compressed chunk, resulting in deadlocks. In addition, an isolation test has been added to test that SELECT queries on a chunk that is currently decompressed can be executed. Fixes timescale#4605

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#4454 Keep locks after reading job status * timescale#4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * timescale#4671 Fix a possible error while flushing the COPY data * timescale#4675 Fix bad TupleTableSlot drop * timescale#4676 Fix a deadlock when decompressing chunks and performing SELECTs * timescale#4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * timescale#4694 Change parameter names of cagg_migrate procedure * timescale#4698 Do not use row-by-row fetcher for parameterized plans * timescale#4711 Remove support for procedures as custom checks * timescale#4712 Fix assertion failure in constify_now * timescale#4713 Fix Continuous Aggregate migration policies * timescale#4720 Fix chunk exclusion for prepared statements and dst changes * timescale#4726 Fix gapfill function signature * timescale#4737 Fix join on time column of compressed chunk * timescale#4738 Fix error when waiting for remote COPY to finish * timescale#4739 Fix continuous aggregate migrate check constraint * timescale#4760 Fix segfault when INNER JOINing hypertables * timescale#4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #4454 Keep locks after reading job status * #4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * #4671 Fix a possible error while flushing the COPY data * #4675 Fix bad TupleTableSlot drop * #4676 Fix a deadlock when decompressing chunks and performing SELECTs * #4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * #4694 Change parameter names of cagg_migrate procedure * #4698 Do not use row-by-row fetcher for parameterized plans * #4711 Remove support for procedures as custom checks * #4712 Fix assertion failure in constify_now * #4713 Fix Continuous Aggregate migration policies * #4720 Fix chunk exclusion for prepared statements and dst changes * #4726 Fix gapfill function signature * #4737 Fix join on time column of compressed chunk * #4738 Fix error when waiting for remote COPY to finish * #4739 Fix continuous aggregate migrate check constraint * #4760 Fix segfault when INNER JOINing hypertables * #4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

@boxhock

This release is a patch release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #4454 Keep locks after reading job status * #4658 Fix error when querying a compressed hypertable with compress_segmentby on an enum column * #4671 Fix a possible error while flushing the COPY data * #4675 Fix bad TupleTableSlot drop * #4676 Fix a deadlock when decompressing chunks and performing SELECTs * #4685 Fix chunk exclusion for space partitions in SELECT FOR UPDATE queries * #4694 Change parameter names of cagg_migrate procedure * #4698 Do not use row-by-row fetcher for parameterized plans * #4711 Remove support for procedures as custom checks * #4712 Fix assertion failure in constify_now * #4713 Fix Continuous Aggregate migration policies * #4720 Fix chunk exclusion for prepared statements and dst changes * #4726 Fix gapfill function signature * #4737 Fix join on time column of compressed chunk * #4738 Fix error when waiting for remote COPY to finish * #4739 Fix continuous aggregate migrate check constraint * #4760 Fix segfault when INNER JOINing hypertables * #4767 Fix permission issues on index creation for CAggs **Thanks** * @boxhock and @cocowalla for reporting a segfault when JOINing hypertables * @carobme for reporting constraint error during continuous aggregate migration * @choisnetm, @dustinsorensen, @jayadevanm and @joeyberkovitz for reporting a problem with JOINs on compressed hypertables * @daniel-k for reporting a background worker crash * @justinpryzby for reporting an error when compressing very wide tables * @maxtwardowski for reporting problems with chunk exclusion and space partitions * @yuezhihan for reporting GROUP BY error when having compress_segmentby on an enum column

jnidzwetzki changed the title ~~Fix a deadlock in chunk decompression and SELECTs~~ Fixing a deadlock when decompressing chunks and performing SELECTs Sep 6, 2022

jnidzwetzki force-pushed the fix_compression_deadlock branch 3 times, most recently from a7e6451 to e0db89d Compare September 6, 2022 14:51

jnidzwetzki self-assigned this Sep 6, 2022

akuzm reviewed Sep 6, 2022

View reviewed changes

jnidzwetzki changed the title ~~Fixing a deadlock when decompressing chunks and performing SELECTs~~ Fix a deadlock when decompressing chunks and performing SELECTs Sep 6, 2022

jnidzwetzki force-pushed the fix_compression_deadlock branch 2 times, most recently from 41e38ed to e99854c Compare September 7, 2022 12:07

jnidzwetzki marked this pull request as ready for review September 7, 2022 12:55

jnidzwetzki requested review from akuzm, mkindahl and gayyappan September 7, 2022 12:56

gayyappan reviewed Sep 7, 2022

View reviewed changes

svenklemm approved these changes Sep 8, 2022

View reviewed changes

jnidzwetzki force-pushed the fix_compression_deadlock branch 2 times, most recently from a27c629 to 3490acb Compare September 16, 2022 08:35

jfjoly added this to the TimescaleDB 2.8.1 milestone Sep 20, 2022

jnidzwetzki force-pushed the fix_compression_deadlock branch from 3490acb to cd5f25d Compare September 21, 2022 06:57

mkindahl approved these changes Sep 22, 2022

View reviewed changes

jnidzwetzki force-pushed the fix_compression_deadlock branch from cd5f25d to 53aed0d Compare September 22, 2022 11:56

jnidzwetzki enabled auto-merge (rebase) September 22, 2022 11:58

jnidzwetzki merged commit de30d19 into timescale:main Sep 22, 2022

jnidzwetzki mentioned this pull request Oct 5, 2022

Release 2.8.1 #4773

Merged

Valocop mentioned this pull request Nov 4, 2022

[Enhancement]: <decompressing locks read queries> #4919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a deadlock when decompressing chunks and performing SELECTs #4676

Fix a deadlock when decompressing chunks and performing SELECTs #4676

jnidzwetzki commented Sep 6, 2022 •

edited by svenklemm

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading

akuzm Sep 6, 2022

akuzm Sep 6, 2022

akuzm Sep 6, 2022

jnidzwetzki Sep 7, 2022

akuzm Sep 7, 2022

jnidzwetzki Sep 8, 2022

akuzm Sep 8, 2022

akuzm Sep 8, 2022

gayyappan Sep 7, 2022

jnidzwetzki Sep 8, 2022

mkindahl Sep 15, 2022

jnidzwetzki Sep 15, 2022

mkindahl Sep 15, 2022

jnidzwetzki Sep 16, 2022

mkindahl Sep 19, 2022

jnidzwetzki Sep 21, 2022

mkindahl left a comment

mkindahl Sep 22, 2022

mkindahl Sep 22, 2022

jnidzwetzki Sep 22, 2022

		/* Prevent readers from using the compressed chunk that is going to be deleted */
		LockRelationOid(uncompressed_chunk->table_id, AccessExclusiveLock);

	-- could lead to deadlocks that are not occur real-world scenarios (due to locks hold on a completely
	-- could lead to deadlocks that do not occur real-world scenarios (due to locks hold on a completely

Fix a deadlock when decompressing chunks and performing SELECTs #4676

Fix a deadlock when decompressing chunks and performing SELECTs #4676

Conversation

jnidzwetzki commented Sep 6, 2022 • edited by svenklemm Loading

codecov bot commented Sep 6, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnidzwetzki commented Sep 6, 2022 •

edited by svenklemm

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading