-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use actual node resource utilization by consuming kubernetes metrics #1555
Use actual node resource utilization by consuming kubernetes metrics #1555
Conversation
0eac57f
to
b90abad
Compare
ad0e809
to
5495682
Compare
5495682
to
78ae117
Compare
func (mc *MetricsCollector) Run(ctx context.Context) { | ||
wait.NonSlidingUntil(func() { | ||
mc.Collect(ctx) | ||
}, 5*time.Second, ctx.Done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the timing be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to spawn the initial implementation with additional configuration. The API can be extended at any point if 5 seconds interval is not sufficient. We could wait for feedback? Additionally, it might help to wait for e.g. 50 seconds to collect more samples before the descheduling cycle starts to soften the captured cpu/memory utilization in case some nodes are wildly changing their utilization. The next iteration could extend the API with:
metricsCollector:
collectionInterval: 10s # collect a new sample every 10s
initialDelay: 50s # wait for 50s before evicting to soften the captured cpu/memory utilization
initialSamples: 5 # collect at least 5 samples before evicting to soften the captured cpu/memory utilization
} | ||
} | ||
|
||
func (mc *MetricsCollector) Run(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if we should kick the go routine off in here so calling it is simpler.
we could block here until we get metrics for the initial load hasSynced
if we wanted.
Could be nice to fail on boot if their metrics server isn't available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of making the developer facing API simpler. In general, I tend to invoke a go routine explicitly so I am aware there's a go routine. Rather then hiding it. Which I would like to avoid in this case.
Could be nice to fail on boot if their metrics server isn't available?
We could replace PollUntilWithContext
with wait.PollWithContext
and set the timeout to e.g. 1 minute? To give the metric server a fighting chance to come up. This could be another configurable option in the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing PollUntilWithContext
with PollWithContext
.
/lgtm |
78ae117
to
8c585fc
Compare
a609360
to
39bc3f0
Compare
00a6962
to
3194bfc
Compare
pkg/framework/plugins/nodeutilization/lownodeutilization_test.go
Outdated
Show resolved
Hide resolved
func weightedAverage(prevValue, value int64) int64 { | ||
return int64(math.Floor(beta*float64(prevValue) + (1-beta)*float64(value))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the Exponentially Weighted Average convergence. But maybe we could detect that we have stopped converging and just use the new value?
hasSynced bool | ||
} | ||
|
||
func NewMetricsCollector(clientset kubernetes.Interface, metricsClientset metricsclient.Interface, nodeSelector string) *MetricsCollector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just meant to have the correct intersection between the resourceNames passed to the actualUsageClient and the real capability.
E2E tests are timeout out. Extending the timeout from 30m to 40m through kubernetes/test-infra#33818. |
/retest-required |
|
||
for _, resourceName := range client.resourceNames { | ||
if _, exists := nodeUsage[resourceName]; !exists { | ||
return fmt.Errorf("unable to find %q resource for collected %q node metric", resourceName, node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, so now we do not need to do check the resourceNames in the client.metricsCollector. It can return anything it wants and we just return an error here if it does not comply.
}, 5*time.Second, ctx.Done()) | ||
} | ||
|
||
func weightedAverage(prevValue, value int64) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have some documentation above this function once we resolve #1555 (comment) thread
6305d68
to
86d76cd
Compare
Thanks! |
@atiratree: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Squashing commits. |
86d76cd
to
6567f01
Compare
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
any idea on when this might be released @ingvagabund ? I see it's not in any of the release notes yet |
Currently targeted for v0.32.0 |
the e2e tests fail frequently, can someone take a look? thanks |
Looks like this is not a cause but a consequence. |
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-sheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strinct on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. Signed-off-by: Simone Tiraboschi <[email protected]>
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. Signed-off-by: Simone Tiraboschi <[email protected]>
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. TODO: add tests Signed-off-by: Simone Tiraboschi <[email protected]>
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. TODO: add tests Signed-off-by: Simone Tiraboschi <[email protected]>
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. TODO: add tests Signed-off-by: Simone Tiraboschi <[email protected]>
Extend LowNodeUtilization with getting nodes and pod usage from kubernetes metrics.
Policy configuration:
Fixes: #225