Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression deadlock detected while reading hyper table #2565

Closed
amincheloh opened this issue Oct 18, 2020 · 10 comments · Fixed by #4676
Closed

Compression deadlock detected while reading hyper table #2565

amincheloh opened this issue Oct 18, 2020 · 10 comments · Fixed by #4676

Comments

@amincheloh
Copy link

Relevant system information:

  • OS: Linux
  • PostgreSQL version (output of postgres --version): 11.9
  • TimescaleDB version (output of \dx in psql): 1.7.4
  • Installation method: Docker

Describe the bug
Compression deadlock detected while reading hyper table

To Reproduce

  1. continuously intensive reads hyper table
  2. compress old chunk in hyper table

Expected behavior
Able to compress chunk while reading hyper table

Actual behavior

2020-10-18 23:41:35.936 UTC [4225] ERROR:  deadlock detected
2020-10-18 23:41:35.936 UTC [4225] DETAIL:  Process 4225 waits for AccessShareLock on relation 116930631 of database 16384; blocked by process 4723.
       Process 4723 waits for AccessExclusiveLock on relation 18024 of database 16384; blocked by process 4225.
       Process 4225: SELECT "deviceId" FROM "n2n_device_monitoring_ping" WHERE ("deviceId" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30)) AND ("occurredAt" >= (now() - interval '4 minute')) AND (NOT ("rtt" IS NULL)) GROUP BY "deviceId" HAVING count("deviceId") >= $31
       Process 4723: SELECT compress_chunk('_timescaledb_internal._hyper_1_2049_chunk')

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@erimatnor
Copy link
Contributor

Potential duplicate of #2575

@benwilson512
Copy link

benwilson512 commented Aug 12, 2021

Experiencing this on 2.4.0 and Postgres 13. Our compression jobs due to deadlocks when queries are being performed on the table. We were not running into this on 2.3.X.

@mkindahl
Copy link
Contributor

Potential duplicate of #2575

I do not think this is a duplicate. #2575 have concurrency issues between drop_chunks and compression, but this is a select and a compression job.

@mkindahl mkindahl self-assigned this Oct 28, 2021
@mkindahl
Copy link
Contributor

mkindahl commented Nov 3, 2021

Looking at the locking graph above, we get this graph.
locking graph
My guess is that relation 18024 is the index for the chunk and 116930631 is the actual chunk. This is because the lock order for queries is to first take a lock in the index and then on the chunk, but for compress_chunk the order will be first take a lock on the chunk (when calling truncate_table) and then on the index (when calling reindex_relation).

@amincheloh
Copy link
Author

@mkindahl 18024 is n2n_device_device table linked with n2n_device_monitoring_ping table via deviceId column, yes 116930631 is actual chunk.

@mkindahl
Copy link
Contributor

mkindahl commented Nov 4, 2021

@amincheloh Can you do an EXPLAIN on the SELECT and add it here?

@mkindahl
Copy link
Contributor

mkindahl commented Nov 4, 2021

@amincheloh Also, a \d+ n2n_device_monitoring_ping could be useful. Looking for details on how it is linked to n2n_device_device.

@amincheloh
Copy link
Author

n2n=# \d+ n2n_device_monitoring_ping
                                        Table "public.n2n_device_monitoring_ping"
     Column     |              Type              | Collation | Nullable | Default | Storage | Stats target | Description
----------------+--------------------------------+-----------+----------+---------+---------+--------------+-------------
 deviceId       | integer                        |           | not null |         | plain   |              |
 occurredAt     | timestamp(0) without time zone |           | not null |         | plain   |              |
 packetReceived | real                           |           |          |         | plain   |              |
 rtt            | real                           |           |          |         | plain   |              |
Indexes:
    "n2n_device_monitoring_ping_deviceid_occurredat_idx" btree ("deviceId", "occurredAt")
    "n2n_device_monitoring_ping_occurredAt_idx" btree ("occurredAt" DESC)
Foreign-key constraints:
    "n2n_device_monitor_ping_deviceId" FOREIGN KEY ("deviceId") REFERENCES n2n_device_device(id)
Triggers:
    ts_insert_blocker BEFORE INSERT ON n2n_device_monitoring_ping FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_1001_chunk,
              _timescaledb_internal._hyper_1_1003_chunk,
              _timescaledb_internal._hyper_1_100_chunk,
              _timescaledb_internal._hyper_1_102_chunk,
              _timescaledb_internal._hyper_1_104_chunk,
              _timescaledb_internal._hyper_1_1064_chunk,
              .
              .
              .
              and so on
n2n=# EXPLAIN SELECT "deviceId" FROM "n2n_device_monitoring_ping" WHERE ("deviceId" IN (1,2,3,4,5,6,7,8,9,10)) AND ("occurredAt" >= (now() - interval '4 minute')) AND (NOT ("rtt" IS NULL)) GROUP BY "deviceId" HAVING count("deviceId") >= 60;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=179840.76..179850.61 rows=263 width=4)
   Group Key: n2n_device_monitoring_ping."deviceId"
   Filter: (count(n2n_device_monitoring_ping."deviceId") >= 60)
   ->  Custom Scan (ChunkAppend) on n2n_device_monitoring_ping  (cost=0.17..28530.71 rows=30262011 width=4)
         Chunks excluded during startup: 787
         ->  Index Scan using _hyper_1_54362_chunk_n2n_device_monitoring_ping_deviceid_occurr on _hyper_1_54362_chunk  (cost=0.56..17.82 rows=1 width=4)
               Index Cond: (("deviceId" = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[])) AND ("occurredAt" >= (now() - '00:04:00'::interval)))
               Filter: (rtt IS NOT NULL)
(8 rows)

@mkindahl
Copy link
Contributor

mkindahl commented Nov 4, 2021

Odd. I would have expected _hyper_1_54362_chunk_n2n_device_monitoring_ping_deviceid_occurr to show up in the locking graph, not n2n_device_device.

@mkindahl
Copy link
Contributor

There is a deadlock in recompress_chunk in issue #3846, but they seem to have different kinds of deadlocks. This issue requires some more investigations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants