-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: clarify reschedule, migrate, and replacement terminology #24929
base: main
Are you sure you want to change the base?
Conversation
Our vocabulary around scheduler behaviors outside of the `reschedule` and `migrate` blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover: * restart: when the tasks of an allocation fail and we try to restart the tasks in place. * reschedule: when the `restart` block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again. * migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker. * replacement: when a node is lost, we don't count that against the `reschedule` tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the `migrate` machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the `group.count` * replacement for `disconnect.replace = true`: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker. Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa. Fixes: #24918
39f153d
to
e3afb1f
Compare
@@ -132,7 +132,7 @@ Usage: nomad job restart [options] <job> | |||
groups are restarted. | |||
|
|||
When rescheduling, the current allocations are stopped triggering the Nomad | |||
scheduler to create replacement allocations that may be placed in different | |||
scheduler to create new allocations that may be placed in different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this whole command text keeps using the term "reschedule" despite internally using the migrate infrastructure when -reschedule
is specified:
Lines 318 to 326 in c1dc9ed
transitionReq := &structs.AllocUpdateDesiredTransitionRequest{ | |
Evals: []*structs.Evaluation{eval}, | |
Allocs: map[string]*structs.DesiredTransition{ | |
args.AllocID: { | |
Migrate: pointer.Of(true), | |
NoShutdownDelay: pointer.Of(args.NoShutdownDelay), | |
}, | |
}, | |
} |
This is important for users to understand as it means it is safe to job restart -reschedule
and not risk causing downtime if your migrate{}
block is properly configured.
Sadly we had no rigor around our definitions of "reschedule" when this command was written: it uses it in the most straightforward definition of "is scheduled again" and not in reference to the reschedule{}
block or associated scheduling logic. For example the RescheduleTracker
is empty for jobs "rescheduled" with nomad job restart -reschedule
.
So now what?
Good question. To enforce purity we'd have to break compatibility as the command flag itself is "-reschedule". Not only is that unacceptable, but what would we call it? -migrate
would be "correct" but endlessly confusing to users.
So I think we're going to have to live with 2 definitions of "reschedule" in the Nomad code base:
- reschedule - to schedule again
reschedule{}
- the parameters and associated scheduling logic that occurs whenrestart{}
s are exhausted.
So ... are you actually suggesting any changes?
...no. I think the original "replacement" was fine. The new "new" is obviously also accurate. None of this command text adheres to our nomenclature though, but I'm not sure it's worth updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ideal, but maybe we could at least explain that the command flag name is inaccurate? And/or we could make -reschedule
an alias for -migrate
/-replace
(?) and mark -reschedule
for deprecation?
Our vocabulary around scheduler behaviors outside of the
reschedule
andmigrate
blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover:restart: when the tasks of an allocation fail and we try to restart the tasks in place.
reschedule: when the
restart
block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again.migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker.
replacement: when a node is lost, we don't count that against the
reschedule
tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run themigrate
machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped thegroup.count
replacement for
disconnect.replace = true
: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker.Add a section to the
reschedule
,migrate
, anddisconnect
blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa.Fixes: #24918
major preview links: