Use actual node resource utilization by consuming kubernetes metrics #1555

ingvagabund · 2024-11-15T15:09:48Z

Extend LowNodeUtilization with getting nodes and pod usage from kubernetes metrics.

Policy configuration:

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
metricsCollector:
  enabled: true
profiles:
- name: ProfileName
  pluginConfig:
  - name: "LowNodeUtilization"
    args:
      thresholds:
        "cpu" : 20
        "memory": 20
        "pods": 20
      targetThresholds:
        "cpu" : 50
        "memory": 50
        "pods": 50
      metricsUtilization:
        metricsServer: true
  plugins:
    balance:
      enabled:
        - "LowNodeUtilization"

Fixes: #225

jklaw90 · 2024-11-17T05:50:38Z

pkg/descheduler/metricscollector/metricscollector.go

+func (mc *MetricsCollector) Run(ctx context.Context) {
+	wait.NonSlidingUntil(func() {
+		mc.Collect(ctx)
+	}, 5*time.Second, ctx.Done())


should the timing be configurable?

I did not want to spawn the initial implementation with additional configuration. The API can be extended at any point if 5 seconds interval is not sufficient. We could wait for feedback? Additionally, it might help to wait for e.g. 50 seconds to collect more samples before the descheduling cycle starts to soften the captured cpu/memory utilization in case some nodes are wildly changing their utilization. The next iteration could extend the API with:

metricsCollector: collectionInterval: 10s # collect a new sample every 10s initialDelay: 50s # wait for 50s before evicting to soften the captured cpu/memory utilization initialSamples: 5 # collect at least 5 samples before evicting to soften the captured cpu/memory utilization

jklaw90 · 2024-11-17T05:59:15Z

pkg/descheduler/metricscollector/metricscollector.go

+	}
+}
+
+func (mc *MetricsCollector) Run(ctx context.Context) {


wonder if we should kick the go routine off in here so calling it is simpler.
we could block here until we get metrics for the initial load hasSynced if we wanted.
Could be nice to fail on boot if their metrics server isn't available?

I like the idea of making the developer facing API simpler. In general, I tend to invoke a go routine explicitly so I am aware there's a go routine. Rather then hiding it. Which I would like to avoid in this case.

Could be nice to fail on boot if their metrics server isn't available?

We could replace PollUntilWithContext with wait.PollWithContext and set the timeout to e.g. 1 minute? To give the metric server a fighting chance to come up. This could be another configurable option in the next iteration.

Replacing PollUntilWithContext with PollWithContext.

pkg/descheduler/metricscollector/metricscollector.go

pkg/framework/plugins/nodeutilization/usageclients.go

jklaw90 · 2024-11-17T06:08:09Z

/lgtm

pkg/descheduler/descheduler.go

pkg/descheduler/metricscollector/metricscollector.go

pkg/framework/plugins/nodeutilization/lownodeutilization_test.go

atiratree · 2024-11-19T15:36:06Z

pkg/descheduler/metricscollector/metricscollector.go

+func weightedAverage(prevValue, value int64) int64 {
+	return int64(math.Floor(beta*float64(prevValue) + (1-beta)*float64(value)))
+}


I am fine with the Exponentially Weighted Average convergence. But maybe we could detect that we have stopped converging and just use the new value?

pkg/framework/plugins/nodeutilization/usageclients.go

pkg/descheduler/metricscollector/metricscollector.go

atiratree · 2024-11-19T15:54:01Z

pkg/descheduler/metricscollector/metricscollector.go

+	hasSynced bool
+}
+
+func NewMetricsCollector(clientset kubernetes.Interface, metricsClientset metricsclient.Interface, nodeSelector string) *MetricsCollector {


No, I just meant to have the correct intersection between the resourceNames passed to the actualUsageClient and the real capability.

ingvagabund · 2024-11-19T18:45:07Z

E2E tests are timeout out. Extending the timeout from 30m to 40m through kubernetes/test-infra#33818.

ingvagabund · 2024-11-19T18:45:11Z

/retest-required

atiratree · 2024-11-20T12:09:04Z

pkg/framework/plugins/nodeutilization/usageclients.go

+
+		for _, resourceName := range client.resourceNames {
+			if _, exists := nodeUsage[resourceName]; !exists {
+				return fmt.Errorf("unable to find %q resource for collected %q node metric", resourceName, node.Name)


+1, so now we do not need to do check the resourceNames in the client.metricsCollector. It can return anything it wants and we just return an error here if it does not comply.

pkg/framework/plugins/nodeutilization/usageclients.go

atiratree · 2024-11-20T12:12:11Z

pkg/descheduler/metricscollector/metricscollector.go

+	}, 5*time.Second, ctx.Done())
+}
+
+func weightedAverage(prevValue, value int64) int64 {


would be good to have some documentation above this function once we resolve #1555 (comment) thread

atiratree · 2024-11-20T13:27:06Z

Thanks!
/lgtm

k8s-ci-robot · 2024-11-20T13:27:11Z

@atiratree: changing LGTM is restricted to collaborators

In response to this:

Thanks!
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ingvagabund · 2024-11-20T13:31:18Z

Squashing commits.

k8s-ci-robot · 2024-11-20T13:31:46Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

langchain-infra · 2024-12-02T19:42:16Z

any idea on when this might be released @ingvagabund ? I see it's not in any of the release notes yet

ingvagabund · 2024-12-04T13:37:01Z

Currently targeted for v0.32.0

a7i · 2024-12-14T14:59:14Z

the e2e tests fail frequently, can someone take a look? thanks
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_descheduler/1570/pull-descheduler-test-e2e-k8s-master-1-31/1867751211424288768

ingvagabund · 2024-12-17T09:50:52Z

Process did not finish before 40m0s timeout

Looks like this is not a cause but a consequence.

…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-sheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strinct on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. Signed-off-by: Simone Tiraboschi <[email protected]>

…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. Signed-off-by: Simone Tiraboschi <[email protected]>

…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. Being just an optional feature, at this stage the PR is not amending the descheduler-cluster-role ClusterRole to be able to patch nodes. Granting that optional permission will be eventually up to the cluster admin when enabling this optional experimental sub-feature. TODO: add tests Signed-off-by: Simone Tiraboschi <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 15, 2024

k8s-ci-robot requested review from jklaw90 and seanmalloy November 15, 2024 15:09

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 15, 2024

ingvagabund mentioned this pull request Nov 15, 2024

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

Open

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 3 times, most recently from 0eac57f to b90abad Compare November 15, 2024 21:42

ingvagabund changed the title ~~WIP: Actual utilization kubernetes metrics~~ Actual utilization kubernetes metrics Nov 15, 2024

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 15, 2024

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 7 times, most recently from ad0e809 to 5495682 Compare November 15, 2024 22:10

ingvagabund changed the title ~~Actual utilization kubernetes metrics~~ Use actual node resource utilization by consuming kubernetes metrics Nov 15, 2024

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 5495682 to 78ae117 Compare November 16, 2024 10:09

jklaw90 reviewed Nov 17, 2024

View reviewed changes

k8s-ci-robot assigned jklaw90 Nov 17, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2024

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 78ae117 to 8c585fc Compare November 17, 2024 10:58

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2024

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 4 times, most recently from a609360 to 39bc3f0 Compare November 17, 2024 11:28

ingvagabund mentioned this pull request Nov 19, 2024

Sync with upstream openshift/descheduler#120

Merged

go mod tidy/vendor k8s.io/metrics

c864166

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 00a6962 to 3194bfc Compare November 19, 2024 15:12

atiratree reviewed Nov 19, 2024

View reviewed changes

atiratree reviewed Nov 20, 2024

View reviewed changes

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 2 times, most recently from 6305d68 to 86d76cd Compare November 20, 2024 13:22

[nodeutilization]: actual usage client through kubernetes metrics

6567f01

ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 86d76cd to 6567f01 Compare November 20, 2024 13:31

ingvagabund added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 20, 2024

k8s-ci-robot merged commit a962cca into kubernetes-sigs:master Nov 20, 2024
9 checks passed

ingvagabund deleted the actual-utilization-kubernetes-metrics branch November 20, 2024 14:15

tiraboschi mentioned this pull request Feb 6, 2025

WIP: [nodeutilization]: allow to set/remove a soft taint from overutilized nodes #1625

Open

tiraboschi mentioned this pull request Feb 7, 2025

Feature: LowNodeUtilization can soft taint overutilized nodes #1626

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use actual node resource utilization by consuming kubernetes metrics #1555

Use actual node resource utilization by consuming kubernetes metrics #1555

ingvagabund commented Nov 15, 2024 •

edited

Loading

jklaw90 Nov 17, 2024

ingvagabund Nov 17, 2024 •

edited

Loading

jklaw90 Nov 17, 2024

ingvagabund Nov 17, 2024 •

edited

Loading

ingvagabund Nov 17, 2024

jklaw90 commented Nov 17, 2024

atiratree Nov 19, 2024

atiratree Nov 19, 2024

ingvagabund commented Nov 19, 2024

ingvagabund commented Nov 19, 2024

atiratree Nov 20, 2024

atiratree Nov 20, 2024

atiratree commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

ingvagabund commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

langchain-infra commented Dec 2, 2024

ingvagabund commented Dec 4, 2024

a7i commented Dec 14, 2024

ingvagabund commented Dec 17, 2024 •

edited

Loading

Use actual node resource utilization by consuming kubernetes metrics #1555

Use actual node resource utilization by consuming kubernetes metrics #1555

Conversation

ingvagabund commented Nov 15, 2024 • edited Loading

jklaw90 Nov 17, 2024

Choose a reason for hiding this comment

ingvagabund Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

jklaw90 Nov 17, 2024

Choose a reason for hiding this comment

ingvagabund Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

ingvagabund Nov 17, 2024

Choose a reason for hiding this comment

jklaw90 commented Nov 17, 2024

atiratree Nov 19, 2024

Choose a reason for hiding this comment

atiratree Nov 19, 2024

Choose a reason for hiding this comment

ingvagabund commented Nov 19, 2024

ingvagabund commented Nov 19, 2024

atiratree Nov 20, 2024

Choose a reason for hiding this comment

atiratree Nov 20, 2024

Choose a reason for hiding this comment

atiratree commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

ingvagabund commented Nov 20, 2024

k8s-ci-robot commented Nov 20, 2024

langchain-infra commented Dec 2, 2024

ingvagabund commented Dec 4, 2024

a7i commented Dec 14, 2024

ingvagabund commented Dec 17, 2024 • edited Loading

ingvagabund commented Nov 15, 2024 •

edited

Loading

ingvagabund Nov 17, 2024 •

edited

Loading

ingvagabund Nov 17, 2024 •

edited

Loading

ingvagabund commented Dec 17, 2024 •

edited

Loading