-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler improvements and API tweaks #286
Conversation
jdef
commented
Apr 16, 2017
•
edited
Loading
edited
- httpsched transitions to disconnected upon errors generated by the Decoder
- demo scheduler properly handles Mesos master failover
- httpcli: deprecate ProtocolError and Err variables in favor of APIError
- minor doc tweaks
frameworkIDStore = NewInMemoryIDStore() | ||
controlContext = &controller.ContextAdapter{ | ||
DoneFunc: func() bool { return state.done }, | ||
// not called concurrently w/ subscription events, don't worry about using the atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad comment
@@ -131,13 +131,19 @@ func (ca *ContextAdapter) Error(err error) { | |||
} | |||
} | |||
|
|||
type SchedulerError string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty bad name. we can do better
frameworkID := e.GetSubscribed().GetFrameworkID().GetValue() | ||
if state.frameworkID == "" || state.frameworkID != frameworkID { | ||
if state.frameworkID != "" && state.frameworkID != frameworkID && state.config.checkpoint { | ||
state.done = true // TODO(jdef) not goroutine safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these are really panic
worthy conditions instead
api/v1/lib/scheduler/calls/caller.go
Outdated
@@ -85,6 +101,19 @@ func FrameworkCaller(frameworkID string) Decorator { | |||
} | |||
} | |||
|
|||
// DynamicFrameworkCaller generates and returns a Decorator that applies the given frameworkID to all calls (except SUBSCRIBE). | |||
func DynamicFrameworkCaller(frameworkID func() string) Decorator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still dislike this name. maybe SubscribedCaller
instead?
api/v1/lib/scheduler/calls/caller.go
Outdated
@@ -85,6 +101,19 @@ func FrameworkCaller(frameworkID string) Decorator { | |||
} | |||
} | |||
|
|||
// DynamicFrameworkCaller generates and returns a Decorator that applies the given frameworkID to all calls (except SUBSCRIBE). | |||
func DynamicFrameworkCaller(frameworkID func() string) Decorator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still dislike this name. maybe SubscribedCaller
instead?
state.frameworkID = e.GetSubscribed().GetFrameworkID().GetValue() | ||
frameworkID := e.GetSubscribed().GetFrameworkID().GetValue() | ||
if state.frameworkID == "" || state.frameworkID != frameworkID { | ||
if state.frameworkID != "" && state.frameworkID != frameworkID && state.config.checkpoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required to be nested, pull outside of containing if
@@ -84,6 +88,7 @@ func NewConfig() Config { | |||
url: env("MESOS_MASTER_HTTP", "http://:5050/api/v1/scheduler"), | |||
codec: codec{Codec: &encoding.ProtobufCodec}, | |||
timeout: envDuration("MESOS_CONNECT_TIMEOUT", "20s"), | |||
failoverTimeout: envDuration("FRAMEWORK_FAILOVER_TIMEOUT", "1000h"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCHEDULER_FAILOVER_TIMEOUT
@@ -78,7 +76,9 @@ func (_ *controllerImpl) Run(config Config) (lastErr error) { | |||
for !config.Context.Done() { | |||
frameworkID := config.Context.FrameworkID() | |||
if config.Framework.GetFailoverTimeout() > 0 && frameworkID != "" { | |||
subscribe.Subscribe.FrameworkInfo.ID = &mesos.FrameworkID{Value: frameworkID} | |||
frameworkProto := &mesos.FrameworkID{Value: frameworkID} | |||
subscribe.Subscribe.FrameworkInfo.ID = frameworkProto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to consolidate these lines and say something like subscribe.WithFrameworkID(frameworkID)
api/v1/lib/httpcli/http.go
Outdated
}) | ||
) | ||
|
||
// ProtocolError is a generic error type returned for expected status codes | ||
// received from Mesos. | ||
// Deprecated: no longer used in favor of APIError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to just break the API now and remove this, otherwise downstream consumers will wonder why they're not seeing ProtocolError's anymore. Better to break the build and force change.
Maybe tackle this in a follow-up PR? This one is getting a bit long already.
upon the scheduler receiving an ERROR event, it should probably re-try subscription. unless Mesos terminates the current subscription connection after sending an ERROR (which is not in the contract) then we're still in trouble because the |
This PR is massive enough and I'm mostly happy (enough) with it. Will keep it open for a few days for comments and merge if there are no complaints. |
ideally there would be tests shipping with these changes |
…ecting disconnect
…zed clusters where all agents have GPU resources
…rFunc processes responses instead of status codes
fixes #277 |
99f3370
to
7fd41ca
Compare
filed ticket to track tech debt for better unit tests in |