-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dogswatch: fix Agent handler and policy check #573
Conversation
This bug highlights the need to atomically emit, filter, and coordinate the events even from a single source, I don't yet have a greater solution in this way and I'd like to explore options that replace the strategy here in favor of async jobs scheduled with Kubernetes' native resources. |
5f3f8fd
to
e89f9c8
Compare
There's a fair amount of change here - mostly revolving around the tests and the supporting bug fixes protecting the codepaths against duplicate messaging which in turn causes greater issues with race conditions due to externalized data (annotations). I'm just about ready to call this good - there's one case where the controller allows simultaneous updates incorrectly and I'm trying to chase down why. This issue should be resolved prior to pushing an image as this could/would cause clusters to down themselves in short order. |
1d9d615
to
fd107d0
Compare
This also adds additional logging to diagnose issues during run. Signed-off-by: Jacob Vallejo <[email protected]>
Signed-off-by: Jacob Vallejo <[email protected]>
Signed-off-by: Jacob Vallejo <[email protected]>
The preflight should startup before the workers' event loops begin to catch wind of node action. Signed-off-by: Jacob Vallejo <[email protected]>
The early check could conflict with the first set of events sent to the node and cause it to re-emit events that it shouldn't. Signed-off-by: Jacob Vallejo <[email protected]>
The logic was written against an "is" state, not the "is next" state. Check is retuned to consider this.
The number of active nodes is dependent on their perceived and posted Intent; the policy check and its context needs to consider the activity based on these perceived Intents. Tests were also added for this extracted predicate.
Queue protections against growing event submission will cause some messages to be dropped or contextually denied due to policy at time of evaluation. To allow for this, the policy checks must be at the time that its beginning to take flight in a "single threaded" manner.
d38884e
to
8b633c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐶 ⌚
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on my cluster and LGTM!
8619cf3
to
d5fccf9
Compare
Force-pushed to fixup + squash commits - no delta here! |
Issue #, if available:
indirectly #505, continues #239
Description of changes:
This fixes some misbehaving posting of intents that cause the Agent to otherwise loop over its own emitted events. This change splits the checks out and executes them on
sitrep
requests (during stabilization) as well as stabilize the handling of duplicate messages to the same end.README is updated to include the steps needed to run
dogswatch
in a cluster to test and run - to test the container image will need to be updated to an ECR repository that's had the development image pushed to it.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.