Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: clarify reschedule, migrate, and replacement terminology #24929

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tgross
Copy link
Member

@tgross tgross commented Jan 23, 2025

Our vocabulary around scheduler behaviors outside of the reschedule and migrate blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover:

  • restart: when the tasks of an allocation fail and we try to restart the tasks in place.

  • reschedule: when the restart block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again.

  • migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker.

  • replacement: when a node is lost, we don't count that against the reschedule tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the migrate machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the group.count

  • replacement for disconnect.replace = true: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the reschedule, migrate, and disconnect blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa.

Fixes: #24918


major preview links:

@tgross tgross added theme/docs Documentation issues and enhancements theme/restart/reschedule backport/website This will backport PR changes to `stable-website` && the latest release-branch backport/1.7.x backport to 1.7.x release line backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.8.x backport to 1.8.x release line backport/1.9.x backport to 1.9.x release line labels Jan 23, 2025
Our vocabulary around scheduler behaviors outside of the `reschedule` and
`migrate` blocks leaves room for confusion around whether the reschedule tracker
should be propagated between allocations. There are effectively five different
behaviors we need to cover:

* restart: when the tasks of an allocation fail and we try to restart the tasks
  in place.

* reschedule: when the `restart` block runs out of attempts (or the allocation
  fails before tasks even start), and we need to move
  the allocation to another node to try again.

* migrate: when the user has asked to drain a node and we need to move the
  allocations. These are not failures, so we don't want to propagate the
  reschedule tracker.

* replacement: when a node is lost, we don't count that against the `reschedule`
  tracker for the allocations on the node (it's not the allocation's "fault",
  after all). We don't want to run the `migrate` machinery here here either, as we
  can't contact the down node. To the scheduler, this is effectively the same as
  if we bumped the `group.count`

* replacement for `disconnect.replace = true`: this is a replacement, but the
  replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the `reschedule`, `migrate`, and `disconnect` blocks explaining
when each item applies. Update the use of the word "reschedule" in several
places where "replacement" is correct, and vice-versa.

Fixes: #24918
@@ -132,7 +132,7 @@ Usage: nomad job restart [options] <job>
groups are restarted.

When rescheduling, the current allocations are stopped triggering the Nomad
scheduler to create replacement allocations that may be placed in different
scheduler to create new allocations that may be placed in different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this whole command text keeps using the term "reschedule" despite internally using the migrate infrastructure when -reschedule is specified:

transitionReq := &structs.AllocUpdateDesiredTransitionRequest{
Evals: []*structs.Evaluation{eval},
Allocs: map[string]*structs.DesiredTransition{
args.AllocID: {
Migrate: pointer.Of(true),
NoShutdownDelay: pointer.Of(args.NoShutdownDelay),
},
},
}

This is important for users to understand as it means it is safe to job restart -reschedule and not risk causing downtime if your migrate{} block is properly configured.

Sadly we had no rigor around our definitions of "reschedule" when this command was written: it uses it in the most straightforward definition of "is scheduled again" and not in reference to the reschedule{} block or associated scheduling logic. For example the RescheduleTracker is empty for jobs "rescheduled" with nomad job restart -reschedule.

So now what?

Good question. To enforce purity we'd have to break compatibility as the command flag itself is "-reschedule". Not only is that unacceptable, but what would we call it? -migrate would be "correct" but endlessly confusing to users.

So I think we're going to have to live with 2 definitions of "reschedule" in the Nomad code base:

  1. reschedule - to schedule again
  2. reschedule{} - the parameters and associated scheduling logic that occurs when restart{}s are exhausted.

So ... are you actually suggesting any changes?

...no. I think the original "replacement" was fine. The new "new" is obviously also accurate. None of this command text adheres to our nomenclature though, but I'm not sure it's worth updating.

Copy link
Member Author

@tgross tgross Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal, but maybe we could at least explain that the command flag name is inaccurate? And/or we could make -reschedule an alias for -migrate/-replace(?) and mark -reschedule for deprecation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/website This will backport PR changes to `stable-website` && the latest release-branch backport/1.7.x backport to 1.7.x release line backport/1.8.x backport to 1.8.x release line backport/1.9.x backport to 1.9.x release line theme/docs Documentation issues and enhancements theme/restart/reschedule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lost allocation drops reschedule tracker
2 participants