-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: job crash detected, see server logs #7085
Comments
@pgloader can u please copy/paste the result of the following query? SELECT * FROM timescaledb_information.job_history ORDER BY start_time; |
Hi, I think I could have the same issues with background jobs. In my case, I have this issue with continuous aggregations:
After the segfault, the whole Postgres restarts and enters recovery mode for a while. I'm using Timescale 2.15.3 This is the result of the query you asked.
|
@cosimomeli looks like your background job for cagg refresh is leading to segfault then the job_errors output is correct. Would be great if you open another issue related to this segmentation for for further investigation. |
SELECT * FROM timescaledb_information.job_history ORDER BY start_time; 875 | 1005 | t | _timescaledb_functions | policy_compression | 1481534 | 2024-10-20 09:15:00.004445-04 | 2024-10-20 09:15:00.016282-04 | {"hypertable_id": 1, "compress_after": "14 days"} | | These are the most recent |
Could you take a look at your logs for these failures?
Some time around Or do the logs not contain anything this time either? Thanks. |
2024-10-20 01:30:00.010 EDT @ LOG: continuous aggregate refresh (individual invalidation) on "cag_30m_metric_data_300" in window [ 2024-10-20 00:30:00-04, 2024-10-20 01:00:00-04 ] |
Is that possible you to cut your logs from Also recently I've made some refactoring on the code that capture and record logs executions and exceptions and would be nice if you can try it out by updating the extension to the 2.17.1. |
Please see the attached |
A quick grep into your logs showed the following:
Looks like another process cancel the execution of the job so the error history is correct. |
Most likely conflicted with the pg_dump backup. |
If you have successful executions after the failure then you're safe since the next execution will process all invalidation logs created even it it was before the window range executed by policy. The downside of it is that it will take more time to refresh because it have more buckets to aggregate. |
Hi guys, we recently had a problem very similar to the one reported (unfortunately we are not yet on the updated version of timescale) but the perception was as follows: the job failed after a problem in Postgres (too many clients), then the bank went into (recovery mode), when it came back the jobs didn't work again, and to get around it, just removing and recreating the job made it possible to do the operation again, have you ever had something similar? |
We have had reports of similar situation, where disabling and enabling a job makes it not run again. It might be useful to check that there is a scheduler running for that database as well as the job information, in particular the next start time and if the job is enabled. |
@cosimomeli You had a segmentation fault for the job. You don't happen to have a stack trace that you can add here as well? It might help us pinpoint the issue. |
Hi, I had no stack trace to share, but we saw the issue was related to LLVM, and we solved it in an unexpected way: moving the instance from an ARM node to an x86 one. |
It's difficult to move forward with this one without knowing where the crash is. |
What type of bug is this?
Crash
What subsystems and features are affected?
Background worker
What happened?
job crash detected, see server logs but there was no information in the PostrgeSQL statement log
TimescaleDB: 2.15.2
PostgreSQL: 16.3
log_min_error_statement: log
Besides, the message in the job_history would disappear
it was 5 minutes ago
=# select * from job_errors;
job_id | proc_schema | proc_name | pid | start_time | finish_time | sqlerrcode | err_message
--------+------------------------+-------------------------------------+---------+-------------------------------+-------------------------------+------------+-------------------------------------
1003 | _timescaledb_functions | policy_refresh_continuous_aggregate | 1116242 | 2024-06-28 09:06:39.752447-04 | 2024-06-28 09:06:39.752542-04 | | job crash detected, see server logs
1002 | _timescaledb_functions | policy_refresh_continuous_aggregate | 1116242 | 2024-06-28 10:46:50.699781-04 | 2024-06-28 10:46:50.699857-04 | | job crash detected, see server logs
1025 | _timescaledb_functions | policy_refresh_continuous_aggregate | 2128427 | 2024-07-01 06:30:00.006238-04 | 2024-07-01 06:30:00.006365-04 | | job crash detected, see server logs
1023 | _timescaledb_functions | policy_refresh_continuous_aggregate | 2128427 | 2024-07-01 07:00:00.00073-04 | 2024-07-01 07:00:00.000763-04 | | job crash detected, see server logs
(4 rows)
Now
select * from job_errors;
job_id | proc_schema | proc_name | pid | start_time | finish_time | sqlerrcode | err_message
--------+------------------------+-------------------------------------+---------+-------------------------------+-------------------------------+------------+-------------------------------------
1003 | _timescaledb_functions | policy_refresh_continuous_aggregate | 1116242 | 2024-06-28 09:06:39.752447-04 | 2024-06-28 09:06:39.752542-04 | | job crash detected, see server logs
1002 | _timescaledb_functions | policy_refresh_continuous_aggregate | 1116242 | 2024-06-28 10:46:50.699781-04 | 2024-06-28 10:46:50.699857-04 | | job crash detected, see server logs
1025 | _timescaledb_functions | policy_refresh_continuous_aggregate | 2128427 | 2024-07-01 06:30:00.006238-04 | 2024-07-01 06:30:00.006365-04 | | job crash detected, see server logs
1023 | _timescaledb_functions | policy_refresh_continuous_aggregate | 2128427 | 2024-07-01 09:00:00.008187-04 | 2024-07-01 09:00:00.008324-04 | | job crash detected, see server logs
(4 rows)
The message 2024-07-01 06:30:00.006238-04 was gone
TimescaleDB version affected
2.15.2
PostgreSQL version used
16.3
What operating system did you use?
RHEL8.6
What installation method did you use?
Source
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
How can we reproduce the bug?
The text was updated successfully, but these errors were encountered: