Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cancel handling in pipedv1 scheduler #5597

Merged
merged 4 commits into from
Mar 4, 2025

Conversation

Warashi
Copy link
Contributor

@Warashi Warashi commented Feb 21, 2025

What this PR does:

  • fix the handling of errors when the context is canceled. This occurs when the user cancels the deployment from WebUI.
  • remove StageStatusCancelled from the SDK because cancel is handled in the piped. We don't have to handle cancel in the plugin implementations.

Why we need it:

  • to fix cancel behavior
  • to clarify what we need to implement in plugins

Which issue(s) this PR fixes:

Part of #4980 #5530

Does this PR introduce a user-facing change?: No

  • How are users affected by this change:
  • Is this breaking change:
  • How to migrate (if breaking change):

because the piped handles the case as cancelled by the user without using the plugin's result.

Signed-off-by: Shinnosuke Sawada-Dazai <[email protected]>
@Warashi Warashi marked this pull request as ready for review February 21, 2025 04:42
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 26.24%. Comparing base (acde5ae) to head (4d96f03).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
pkg/app/pipedv1/controller/scheduler.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5597      +/-   ##
==========================================
- Coverage   26.28%   26.24%   -0.04%     
==========================================
  Files         470      473       +3     
  Lines       50353    50450      +97     
==========================================
+ Hits        13234    13242       +8     
- Misses      36059    36146      +87     
- Partials     1060     1062       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 458 to 459
StageStatusSuccess StageStatus = 2
StageStatusFailure StageStatus = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] I don't remember why we made this enum start from 2; could you teach me? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these lines are copied from below. It's a bit confusing, so I want to make them start from 1.

// StageStatus represents the current status of a stage of a deployment.
type StageStatus int32
const (
StageStatus_STAGE_NOT_STARTED_YET StageStatus = 0
StageStatus_STAGE_RUNNING StageStatus = 1
StageStatus_STAGE_SUCCESS StageStatus = 2
StageStatus_STAGE_FAILURE StageStatus = 3
StageStatus_STAGE_CANCELLED StageStatus = 4
StageStatus_STAGE_SKIPPED StageStatus = 5
StageStatus_STAGE_EXITED StageStatus = 6
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it on this commit.
a755ab8

@Warashi Warashi requested a review from khanhtc1202 February 25, 2025 00:13
@@ -78,7 +78,10 @@ func wait(ctx context.Context, duration time.Duration, initialStart time.Time, s

case <-ctx.Done(): // on cancelled
slp.Info("Wait cancelled")
return sdk.StageStatusCancelled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[IMO] I think StageStatusCancelled should remain, although it's not used in piped.
That's because plugin developers will be confused about which status to return.

If we want to remove StageStatusCancelled, we should remove case <-ctx.Done(): section too. (If possible, that's ideal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are concerned about.
On the other hand, my concern is that the plugin developers may think they have to handle context cancellation as StageStatusCancelled. This is incorrect; the plugin should exit its operation on the context cancel without concern about its response.
The WAIT plugin's case is special because we must handle context cancellation to exit its operation. Almost all plugins can do this only by passing the context to their internal functions because deployment operations can handle context cancellation as a failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugin developers may think they have to handle context cancellation as StageStatusCancelled.

I agree!

What about WaitApproval and ScriptRun stages?

e.LogPersister.Infof("Waiting for approval from at least %d user(s)...", num)
for {
select {
case <-ticker.C:
if e.checkApproval(ctx, num) {
return model.StageStatus_STAGE_SUCCESS
}
case s := <-sig.Ch():
switch s {
case executor.StopSignalCancel:
return model.StageStatus_STAGE_CANCELLED
case executor.StopSignalTerminate:
return originalStatus
default:
return model.StageStatus_STAGE_FAILURE
}
case <-timer.C:
e.LogPersister.Errorf("Timed out %v", timeout)
return model.StageStatus_STAGE_FAILURE
}
}

for {
select {
case result := <-c:
return result
case <-timer.C:
e.LogPersister.Errorf("Canceled because of timeout")
return model.StageStatus_STAGE_FAILURE
case s := <-sig.Ch():
switch s {
case executor.StopSignalCancel:
e.LogPersister.Info("Canceled by user")
return model.StageStatus_STAGE_CANCELLED
case executor.StopSignalTerminate:
e.LogPersister.Info("Terminated by system")
return originalStatus
default:
e.LogPersister.Error("Unexpected")
return model.StageStatus_STAGE_FAILURE
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the timeout should be handled on the piped side so the plugin doesn't have to.

SCRIPT_RUN stage should use os/exec.CommandContext: it handles context cancellation as an interruption of executed commands. So we can implement it without watching ctx.Done().

WAIT_APPROVAL stage is difficult to implement without watching ctx.Done() because it doesn't operate something with context other than polling the approval states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i got it.

Let's add a note about when to handle ctx.Done() to the plugin dev guide.
Even if StageStatusCancelled is removed, plugin developers should be aware of cancellation to certainly exit the stage.

Co-authored-by: Tetsuya KIKUCHI <[email protected]>
Signed-off-by: Shinnosuke Sawada-Dazai <[email protected]>
Copy link
Member

@t-kikuc t-kikuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@khanhtc1202 khanhtc1202 merged commit e1abe10 into master Mar 4, 2025
17 of 18 checks passed
@khanhtc1202 khanhtc1202 deleted the wait-plugin-cancel-handling branch March 4, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants