Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ofelia jobs seems to be stuck #244

Open
frankalicious opened this issue Oct 26, 2023 · 0 comments
Open

ofelia jobs seems to be stuck #244

frankalicious opened this issue Oct 26, 2023 · 0 comments

Comments

@frankalicious
Copy link

I try to debug a problem where my worker job is stuck for several minutes before it is run.
This only happens sporadic. Has someone an idea how to fix that?

To narrow down the problem I added 2 test job to ofelia.

I created a docker-compose.yml based on the docker compose example in the README.
In opposite to the one in the README it starts at every minute when the seconds are 0

version: "3"
services:
  ofelia:
    image: mcuadros/ofelia:latest
    depends_on:
      - nginx
    command: daemon --docker
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    labels:
      ofelia.job-local.my-test-job.schedule: "0 * * * * *"
      ofelia.job-local.my-test-job.command: "date"

  nginx:
    image: nginx
    labels:
      ofelia.enabled: "true"
      ofelia.job-exec.datecron.schedule: "0 * * * * *"
      ofelia.job-exec.datecron.command: "uname -a"

When I run this as only ofelia job on a raspberry pi I can't see the problem. All jobs a finished in a couple of milliseconds.

$ docker logs ofelia-test-ofelia-1 | head
2023-10-24T09:13:19.353Z  scheduler.go:34 ▶ NOTICE New job registered "datecron" - "uname -a" - "0 * * * * *"
2023-10-24T09:13:19.353Z  scheduler.go:34 ▶ NOTICE New job registered "my-test-job" - "date" - "0 * * * * *"
2023-10-24T09:13:19.354Z  scheduler.go:54 ▶ DEBUG Starting scheduler with 2 jobs
2023-10-24T09:14:00.013Z  common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] Started - uname -a
2023-10-24T09:14:00.014Z  common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] Started - date
2023-10-24T09:14:00.026Z  common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] StdOut: Tue Oct 24 09:14:00 UTC 2023
2023-10-24T09:14:00.026Z  common.go:125 ▶ NOTICE [Job "my-test-job" (ee8915deafcc)] Finished in "11.981455ms", failed: false, skipped: false, error: none
2023-10-24T09:14:00.177Z  common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] StdOut: Linux 2a8971447761 6.1.57-v8+ #1688 SMP PREEMPT Thu Oct 12 15:14:26 BST 2023 aarch64 GNU/Linux
2023-10-24T09:14:00.177Z  common.go:125 ▶ NOTICE [Job "datecron" (b35f0e16c9ba)] Finished in "164.080199ms", failed: false, skipped: false, error: none
2023-10-24T09:15:00.004Z  common.go:125 ▶ NOTICE [Job "datecron" (13a5cf1e8a53)] Started - uname -a

We have 5430 finished jobs

$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | wc -l
5430

All of them are in the range of ms

$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | head
"11.981455ms",
"164.080199ms",
"5.26668ms",
"113.323193ms",
"5.984081ms",
"172.045716ms",
"5.973062ms",
"135.727973ms",
"11.564462ms",
"119.189199ms",

$ docker logs ofelia-test-ofelia-1 | grep Finished | awk '{ print $11 }' | grep -v ms
$

Now to the target system. I have 3 jobs running on a v-server. The 2 test jobs from before and 1 real work job.
The work job runs a python script every full 15 minutes (e.g. 8:00, 8:15, 8:30, 8:45, 9:00, ...).

$ docker logs ofelia_test-ofelia-1 | head
2023-10-23T08:09:58.02Z  scheduler.go:34 ▶ NOTICE New job registered "smart_cistern_backend" - "/bin/sh -c \"python -m cProfile -o logs/cProfile/$(date --iso-8601=seconds|sed 's/:/./g')_profiling_all.prof src/main.py\"" - "0 */15 * * * *"
2023-10-23T08:09:58.02Z  scheduler.go:34 ▶ NOTICE New job registered "datecron" - "uname -a" - "0 * * * * *"
2023-10-23T08:09:58.02Z  scheduler.go:34 ▶ NOTICE New job registered "my-test-job" - "date" - "0 * * * * *"
2023-10-23T08:09:58.02Z  scheduler.go:54 ▶ DEBUG Starting scheduler with 3 jobs
2023-10-23T08:10:00.003Z  common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] Started - uname -a
2023-10-23T08:10:00.006Z  common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] Started - date
2023-10-23T08:10:00.024Z  common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] StdOut: Mon Oct 23 08:10:00 UTC 2023
2023-10-23T08:10:00.024Z  common.go:125 ▶ NOTICE [Job "my-test-job" (c0e9493ba184)] Finished in "17.84595ms", failed: false, skipped: false, error: none
2023-10-23T08:10:00.089Z  common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-23T08:10:00.089Z  common.go:125 ▶ NOTICE [Job "datecron" (9b16d2778165)] Finished in "86.147247ms", failed: false, skipped: false, error: none

my-test-job still looks good but datecron sometimes needs a couple of minutes to succeed?
Same problem with my worker job but I think it's easier to just show the 2 test jobs.

2023-10-24T23:15:00.003Z  common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] Started - uname -a
2023-10-24T23:15:00.003Z  common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] Started - date
2023-10-24T23:15:00.005Z  common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] StdOut: Tue Oct 24 23:15:00 UTC 2023
2023-10-24T23:15:00.005Z  common.go:125 ▶ NOTICE [Job "my-test-job" (b16c0f38aa18)] Finished in "1.322497ms", failed: false, skipped: false, error: none
2023-10-24T23:15:00.005Z  common.go:125 ▶ NOTICE [Job "smart_cistern_backend" (133cc0a8505c)] Started - /bin/sh -c "python -m cProfile -o logs/cProfile/$(date --iso-8601=seconds|sed 's/:/./g')_profiling_all.prof
 src/main.py"
2023-10-24T23:16:00.021Z  common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] Started - uname -a
2023-10-24T23:16:00.03Z  common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] Started - date
2023-10-24T23:16:00.033Z  common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] StdOut: Tue Oct 24 23:16:00 UTC 2023
2023-10-24T23:16:00.033Z  common.go:125 ▶ NOTICE [Job "my-test-job" (8c8a772a69f6)] Finished in "2.433805ms", failed: false, skipped: false, error: none
2023-10-24T23:17:00.003Z  common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] Started - uname -a
2023-10-24T23:17:00.004Z  common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] Started - date
2023-10-24T23:17:00.005Z  common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] StdOut: Tue Oct 24 23:17:00 UTC 2023
2023-10-24T23:17:00.005Z  common.go:125 ▶ NOTICE [Job "my-test-job" (d5d674e42c39)] Finished in "1.005371ms", failed: false, skipped: false, error: none
2023-10-24T23:17:17.915Z  common.go:125 ▶ NOTICE [Job "datecron" (4a241d2b3ab6)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:17.915Z  common.go:125 ▶ NOTICE [Job "datecron" (4a241d2b3ab6)] Finished in "9m17.911256973s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (888c7db1f6c6)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (888c7db1f6c6)] Finished in "8m18.36881784s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (88fa518982de)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (88fa518982de)] Finished in "4m18.361351038s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (6460bd79aaec)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.373Z  common.go:125 ▶ NOTICE [Job "datecron" (6460bd79aaec)] Finished in "7m18.368275647s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (ac5fb5b8ef93)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (ac5fb5b8ef93)] Finished in "5m18.363476051s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (e26fa12bf686)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (e26fa12bf686)] Finished in "6m18.362506677s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (e328b02f154f)] Finished in "1m18.353145657s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (702cc7eb973a)] Finished in "2m18.370844487s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (d120d13b0299)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.374Z  common.go:125 ▶ NOTICE [Job "datecron" (d120d13b0299)] Finished in "3m18.360959649s", failed: false, skipped: false, error: none
2023-10-24T23:17:18.375Z  common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] StdOut: Linux 5e658039a6cf 4.15.0 #1 SMP Thu Dec 15 20:31:06 MSK 2022 x86_64 GNU/Linux
2023-10-24T23:17:18.375Z  common.go:125 ▶ NOTICE [Job "datecron" (cbb000d2ffbc)] Finished in "18.37143932s", failed: false, skipped: false, error: none
$ docker logs ofelia_test-ofelia-1 | grep datecron | grep Finished | awk '{ print $11 }' | grep -v ms | grep -v µs | grep m | sort | tail
"6m7.816589961s",
"6m9.582320826s",
"7m18.135762657s",
"7m18.368275647s",
"7m4.584998566s",
"7m49.063024497s",
"7m7.084763573s",
"8m18.36881784s",
"8m4.585050287s",
"9m17.911256973s",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant