You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a client is too busy with GC to start new allocs, the scheduler does not 'respect' or 'detect' that and schedules new jobs there anyways - even if other clients are available and idle.
Reproduction steps
Have multiple clients
Have one client that is too busy with GC to start new allocs
Start some new job (perhaps a periodic batch job that has already run on the client before)
Expected Result
The scheduler to avoid the client when the client is busy with GC and refuses to receive new tasks - picking a different client instead. Some kind of automatic 'deterring' factor for that client when it's busy with GC.
Actual Result
The scheduler doesn't care and schedules it anyways on that client that is already overwhelmed.
Perhaps the scheduler already implements this by looking at the nomad.client.allocations.pending metric? If so, this issue can probably be closed because the behavior would be caused by #24777 instead.
Job file (if appropriate)
Not applicable.
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
Only logs this:
{"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.250494Z","alloc_id":"ac8fd9bd-39f9-133f-c1ae-eb45c1ecc275"}
{"@level":"info","@message":"garbage collecting allocation","@module":"client.gc","@timestamp":"2025-01-06T10:21:36.252995Z","alloc_id":"feb5dc4c-a549-7b82-a18e-733acd2a7013","reason":"number of allocations (68) is over the limit (50)"}
After the GC-ing is complete (perhaps 20 minutes or so later), it starts the alloc and logs things like:
{"@level":"info","@message":"Task event","@module":"client.alloc_runner.task_runner","@timestamp":"2025-01-06T10:23:26.213687Z","alloc_id":"db84c9fb-e9e3-df5e-bc34-42e11f57a32e","failed":false,"msg":"Task received by client","task":"sync","type":"Received"}
The text was updated successfully, but these errors were encountered:
Client GC is asynchronous and shouldn't interfere with placing workloads. At the time of garbage collection, node resources should be free and thus the scheduler places the workload there. Is the node busy with something other than GC?
Nomad version
Nomad v1.9.4
BuildDate 2024-12-18T15:16:22Z
Revision 5e49fcdb7be26941b6c7ad3ed6661bd37e70a9d8+CHANGES
Operating system and Environment details
Ubuntu 22.04.5 LTS on amd64
Issue
When a client is too busy with GC to start new allocs, the scheduler does not 'respect' or 'detect' that and schedules new jobs there anyways - even if other clients are available and idle.
Reproduction steps
Expected Result
The scheduler to avoid the client when the client is busy with GC and refuses to receive new tasks - picking a different client instead. Some kind of automatic 'deterring' factor for that client when it's busy with GC.
Actual Result
The scheduler doesn't care and schedules it anyways on that client that is already overwhelmed.
Perhaps the scheduler already implements this by looking at the
nomad.client.allocations.pending
metric? If so, this issue can probably be closed because the behavior would be caused by #24777 instead.Job file (if appropriate)
Not applicable.
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
Only logs this:
After the GC-ing is complete (perhaps 20 minutes or so later), it starts the alloc and logs things like:
The text was updated successfully, but these errors were encountered: