-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent deadlock on Meter and MeterReading imports #1928
Comments
@nllong I'm having a hard time prioritizing this. The fact that there are a lot of unknowns as well as the infrequency of this (it usually works on the "second" attempt after a failure) is making me think of an "alternative" solution. What do you think of us adding some front end error handling in the case that this error happens that basically says "Something went wrong - try again. If you already tried again, let your system admin know."? It doesn't feel great, and I haven't seen this pattern in SEED before so definitely feel free to shoot that down. |
@adrian-lara -- I think having error handling on the front end would be very helpful to start with. |
I just experienced the same deadlock error while refactoring the upload of BuildingSync to use the celery tasks on the backend. I believe the database was fresh (ie just created) before I ran this. See the error and the database logs below. Though the code is not yet in the repo, it was happening during a transaction where BuildingSync.process() was being called on xml files. Like the example given by Adrian, from what I can infer from the db logs is that the exception is happing during the bulk creation of meter readings (see here). HypothesisThis issue seems isolated to MeterReadings in transactions with concurrent processes. The seed_meterreading table seems to be the only one using the TimescaleDB extension. I found this issue on the timescaledb repo where a user reported what appears to be the same issue we're having. A contributor replied with
Thus it seems possible that this creation of the first chunk is what's causing the deadlock between worker processes, and as a result this should only be an issue when the database is brand new, or after calling FixA potential fix would be to ensure the chunks are created before doing any concurrent inserts. This would probably just be migration that would create then delete a record from the meterreadings table. If chunks are created after the first insert, I'm not sure what the proper fix would be. If seen againUnfortunately I reset the database (dropped and recreated) before realizing table reference numbers would change. I suggest if you run into this issue again, to do the following steps:
Error messagesCelery error message
Database logs
|
Thanks @macintoshpie for investigating this more. I think you are on to something about having the backgrounds tasks wait until all are queued up. Just curious what version of SEED are you on? |
I haven't seen this one in a while. I think the upgrades to Postgres and TimescaleDB between now and the latest comment probably ended up resolving these. |
During PM Meter/MeterReading imports, an error occurs somewhat infrequently. Any time the error occurs, re-importing the file has always lead to a successful import. I haven't heard or seen this happen for GreenButton Meter/MeterReading imports but it could be possible as the logic is similar though not identical.
Unfortunately, the underlying reason behind the error is not fully understood as the error has been difficult reproduce at will. It is understood that Postgres deadlocks are causing the error. Specifically, an Update Exclusive Lock on the MeterReading model and a Row Exclusive Lock on the Meter model seem to be behind these deadlocks.
The PM Meter import process can be found here: https://github.com/SEED-platform/seed/blob/develop/seed/data_importer/tasks.py#L806
There's one transaction, with 2 main queries:
It's necessary to consider that this method is sent to Celery and run as a background task. For any given import, each of these background tasks should be importing/working with separate Meter records (and, subsequently, separate MeterReading records).
Here are the error messages from two instances of this occuring
Steps to Reproduce
This seems to happen randomly. I haven't been able to narrow down why the error occurs.
During my investigation, I did make attempts to reproduce this deadlock situation directly within Postgres, but was unsuccessful. Specifically, I initiated a transaction, created a meter, some readings, initiated another transaction (in another Postgres instance), added meters and/or readings in this new transaction, and committed both transactions in different orders. Throughout these trials, I also observed locks being created during each step. The only error I could produce was one from creating readings in the second transaction for a meter in the first transaction before that first transaction meter was committed.
Instance Information
This is known to have occured in dev1 and during local development.
The text was updated successfully, but these errors were encountered: