dogswatch: fix Agent handler and policy check #573

jahkeup · 2019-12-02T23:24:34Z

Issue #, if available:

indirectly #505, continues #239

Description of changes:

This fixes some misbehaving posting of intents that cause the Agent to otherwise loop over its own emitted events. This change splits the checks out and executes them on sitrep requests (during stabilization) as well as stabilize the handling of duplicate messages to the same end.

README is updated to include the steps needed to run dogswatch in a cluster to test and run - to test the container image will need to be updated to an ECR repository that's had the development image pushed to it.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jahkeup · 2019-12-02T23:24:41Z

This bug highlights the need to atomically emit, filter, and coordinate the events even from a single source, I don't yet have a greater solution in this way and I'd like to explore options that replace the strategy here in favor of async jobs scheduled with Kubernetes' native resources.

jahkeup · 2019-12-06T01:08:05Z

There's a fair amount of change here - mostly revolving around the tests and the supporting bug fixes protecting the codepaths against duplicate messaging which in turn causes greater issues with race conditions due to externalized data (annotations).

I'm just about ready to call this good - there's one case where the controller allows simultaneous updates incorrectly and I'm trying to chase down why. This issue should be resolved prior to pushing an image as this could/would cause clusters to down themselves in short order.

This also adds additional logging to diagnose issues during run. Signed-off-by: Jacob Vallejo <[email protected]>

Signed-off-by: Jacob Vallejo <[email protected]>

The preflight should startup before the workers' event loops begin to catch wind of node action. Signed-off-by: Jacob Vallejo <[email protected]>

The early check could conflict with the first set of events sent to the node and cause it to re-emit events that it shouldn't. Signed-off-by: Jacob Vallejo <[email protected]>

The logic was written against an "is" state, not the "is next" state. Check is retuned to consider this.

The number of active nodes is dependent on their perceived and posted Intent; the policy check and its context needs to consider the activity based on these perceived Intents. Tests were also added for this extracted predicate.

Queue protections against growing event submission will cause some messages to be dropped or contextually denied due to policy at time of evaluation. To allow for this, the policy checks must be at the time that its beginning to take flight in a "single threaded" manner.

zmrow

Initial few comments

extras/dogswatch/README.md

extras/dogswatch/pkg/controller/manager.go

etungsten

🐶 ⌚

zmrow

📦

patraw

Tested on my cluster and LGTM!

jahkeup · 2019-12-09T22:16:01Z

Force-pushed to fixup + squash commits - no delta here!

jahkeup force-pushed the dogswatch-sitrep branch from 5f3f8fd to e89f9c8 Compare December 4, 2019 01:11

jahkeup changed the title ~~dogswatch: Split update and resource annotation check in Agent~~ dogswatch: fix Agent handler and policy check Dec 5, 2019

jahkeup force-pushed the dogswatch-sitrep branch 2 times, most recently from 1d9d615 to fd107d0 Compare December 7, 2019 00:27

jahkeup added 24 commits December 6, 2019 16:38

dogswatch: check update during sitrep

74961fe

This also adds additional logging to diagnose issues during run. Signed-off-by: Jacob Vallejo <[email protected]>

dogswatch: add test for stabilization case

acadbbc

Signed-off-by: Jacob Vallejo <[email protected]>

dogswatch: add some docstrings

cb9ba3a

Signed-off-by: Jacob Vallejo <[email protected]>

dogswatch: mitigate containerd before update

ff8e2e7

dogswatch: add logging in subcomponents

e15a2ee

dogswatch: reorder preflight checks

7998960

The preflight should startup before the workers' event loops begin to catch wind of node action. Signed-off-by: Jacob Vallejo <[email protected]>

dogswatch: split poll interval for initial time

7718e34

The early check could conflict with the first set of events sent to the node and cause it to re-emit events that it shouldn't. Signed-off-by: Jacob Vallejo <[email protected]>

dogswatch: extend logging and use common fields

2bcbe04

dogswatch: thread ldflags for debug container

7a9cc2c

dogswatch: log policy in debug build

782852d

dogswatch: fix policy handling for to-be intent

933b07d

The logic was written against an "is" state, not the "is next" state. Check is retuned to consider this.

dogswatch: refactor active intent accumulator

fa96d51

The number of active nodes is dependent on their perceived and posted Intent; the policy check and its context needs to consider the activity based on these perceived Intents. Tests were also added for this extracted predicate.

dogswatch: clean up logged debug messages

9015849

dogswatch: consider terminal states separately

559e623

dogswatch: add policy test cases

b843c76

dogswatch: delay startup on debuggable builds

b44bd8e

dogswatch: don't exit event loop on policy denials

6b7e50b

dogswatch: protect against nil

657a73d

dogswatch: add caching for dedupe

d8d03f8

dogswatch: tidy controller pkg

cfeb60c

dogswatch: handle busy updates as not stuck

a6fb48d

dogswatch: use readonly modules in build container

a50ad8d

dogswatch: cache only if queued to process

54f16f6

jahkeup added 4 commits December 6, 2019 16:41

dogswatch: debuggable build only on "true"

a2f20c4

dogswatch: extend equivalency check

fbeed7a

dogswatch: track posted Intents to filter

44b18d4

dogswatch: add flag to skip mitigations

d50367f

jahkeup force-pushed the dogswatch-sitrep branch 3 times, most recently from d38884e to 8b633c9 Compare December 7, 2019 02:43

jahkeup requested review from patraw and etungsten December 7, 2019 02:44

jahkeup marked this pull request as ready for review December 7, 2019 02:44

zmrow reviewed Dec 9, 2019

View reviewed changes

etungsten approved these changes Dec 9, 2019

View reviewed changes

extras/dogswatch/pkg/controller/manager.go Outdated Show resolved Hide resolved

jahkeup requested review from etungsten and zmrow December 9, 2019 20:15

etungsten approved these changes Dec 9, 2019

View reviewed changes

bottlerocket-os deleted a comment from jahkeup Dec 9, 2019

zmrow approved these changes Dec 9, 2019

View reviewed changes

patraw approved these changes Dec 9, 2019

View reviewed changes

jahkeup added 5 commits December 9, 2019 14:13

dogswatch: update README with suggested steps

7962f80

dogswatch: add target to build debuggable image

25a7cbe

dogswatch: log debug queue info

bc04e2d

dogswatch: make queue handling more shallow

8e154be

dogswatch: match status command in readme

d5fccf9

jahkeup force-pushed the dogswatch-sitrep branch from 8619cf3 to d5fccf9 Compare December 9, 2019 22:15

jahkeup merged commit 391dfa0 into develop Dec 9, 2019

jahkeup deleted the dogswatch-sitrep branch December 9, 2019 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dogswatch: fix Agent handler and policy check #573

dogswatch: fix Agent handler and policy check #573

jahkeup commented Dec 2, 2019 •

edited

Loading

jahkeup commented Dec 2, 2019

jahkeup commented Dec 6, 2019

zmrow left a comment

etungsten left a comment

zmrow left a comment

patraw left a comment

jahkeup commented Dec 9, 2019

dogswatch: fix Agent handler and policy check #573

dogswatch: fix Agent handler and policy check #573

Conversation

jahkeup commented Dec 2, 2019 • edited Loading

jahkeup commented Dec 2, 2019

jahkeup commented Dec 6, 2019

zmrow left a comment

Choose a reason for hiding this comment

etungsten left a comment

Choose a reason for hiding this comment

zmrow left a comment

Choose a reason for hiding this comment

patraw left a comment

Choose a reason for hiding this comment

jahkeup commented Dec 9, 2019

jahkeup commented Dec 2, 2019 •

edited

Loading