-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Deadlock Between drop_chunks() and the compress_chunks BGW #2575
Comments
It would be good to get a reproducible example for this deadlock. |
Potential duplicate of #2509 |
Honestly, Not sure this is a dupe of 2509 because the deadlock here is on a |
I post link to a PR merged into 1.7.3, which might relevant to this issue: #2150 |
I had a similar thing happening over and over again. Posted that on slack also, but then found this issue: I have setup a timescaledb via the helm charts Running Timescaledb 1.7.4 with pg12 It is running ok. But multiple times a day I see connections stacking up on the master. SELECT table_name AS "Name",table_bytes AS "table",index_bytes AS "index",toast_bytes AS "toast",total_bytes AS "total" FROM _timescaledb_catalog.hypertable, hypertable_relation_size(schema_name::text||'.'||table_name::text) ORDER BY total_bytes DESC;
SELECT sum(index_bytes) FROM _timescaledb_catalog.hypertable, hypertable_relation_size(schema_name::text||'.'||table_name::text); These are growing until there are no more connections left. 2020-10-28 13:21:57 UTC [6921]: [5f997072.1b09-3] postgres@postgres,app=[unknown] [00000] LOG: process 6921 still waiting for AccessShareLock on relation 575409 of database 13408 after 1000.082 ms
2020-10-28 13:21:57 UTC [6921]: [5f997072.1b09-4] postgres@postgres,app=[unknown] [00000] DETAIL: Process holding the lock: 6563. Wait queue: 6761, 6764, 6765, 6766, 6767, 6768, 6769, 6771, 6772, 6774, 6778, 6779, 6777, 6780, 6782, 6785, 6784, 6790, 6788, 6789, 6795, 6794, 6797, 6796, 6798, 6847, 6851, 6846, 6852, 6850, 6857, 6856, 6860, 6862, 6861, 6899, 6897, 6902, 6903, 6901, 6906, 6909, 6912, 6913, 6910, 6918, 6917, 6922, 6921, 6920.
2020-10-28 13:21:57 UTC [6921]: [5f997072.1b09-5] postgres@postgres,app=[unknown] [00000] CONTEXT: PL/pgSQL function hypertable_relation_size(regclass) line 12 at RETURN QUERY
2020-10-28 13:21:57 UTC [6921]: [5f997072.1b09-6] postgres@postgres,app=[unknown] [00000] STATEMENT: SELECT sum(toast_bytes) FROM _timescaledb_catalog.hypertable, hypertable_relation_size(schema_name::text||'.'||table_name::text); |
@BenjenJones Is there compression or drop chunks in your case? Do you call hypertable catalog or information views on hypertable? |
i am using compression and also drop_chunks. And I am using a Grafana Dashboard as Monitoring (that calls hypertable catalog information) edit: crap you might be right ... one of the queries I am seeing in the logs, is the same as one widget from Grafana is using. closing down the dashboard and check if that happens again. |
I wasn't able to reproduce this in tests on the 2.0 code base. Isolation tests were expanded for this case in #2688. In particular, I added tests where I tried to reproduce the deadlock, but I wasn't able to. It might mean that prior fixes to locking in Note, however, that there are many situations where |
@JLockerman can we please have a way to reproduce this? |
I've got some information how to repro it from @JLockerman and |
To run the Promscale test:
|
The Promscale test adds compression policy(s) during test setup and calls |
According to the source code in current master
Compress chunk obtains locks excluding compression related tables
I am setting up an isolation test to see if introducing wait points between obtained locks in drop chunk and compress chunk will lead to a deadlock. |
Comparing with 1.7.x branch, I found few differences:
In 1.7.x the compress chunk obtains Drop chunks
So it doesn't seem that the deadlock might be resolved in 2.x |
I updated instructions to repro the deadlock in the above comment I also pushed Promscale version, which I use to repro on 1.7.5, into my branch. An isolation test was set to run different permutations of locking between compress and drop in correspondence to above comment in this branch however it didn't result in any deadlocks. So it seems that I haven't identified all the relevant locks obtained in the functionalities. |
Relevant system information:
Using the Promscale tests using the
timescaledev/promscale-extension:latest-pg12
docker image, we occasionally getwhich appears to be a tuple-level deadlock on
dimension_slice
The text was updated successfully, but these errors were encountered: