Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Potentially duplicated labels present in object metadata #30

Open
yuxiang-he opened this issue Dec 18, 2024 · 3 comments
Open
Assignees

Comments

@yuxiang-he
Copy link

yuxiang-he commented Dec 18, 2024

Problem Description

In some of the k8s objects created from the Helm chart template, there are duplicate keys for certain labels in the metadata, e.g. app.kubernetes.io/name. When validating the generated manifest with Kubeconform, we run into errors like

    error unmarshalling resource:
    error converting YAML to JSON:
    yaml:
    unmarshal errors:
  line 22:
    key "app.kubernetes.io/name" already set in map

When looking at the generated manifest, it indeed has duplicates, e.g. app.kubernetes.io/name is duplicated in the pod template's label for this deployment

# Source: gpu-operator-charts/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: testonly-placeholder-name-gpu-operator-charts-controller-manager
  labels:
    app.kubernetes.io/component: amd-gpu
    app.kubernetes.io/part-of: amd-gpu
    control-plane: controller-manager
    helm.sh/chart: gpu-operator-charts-v1.0.0
    app.kubernetes.io/name: gpu-operator-charts
    app.kubernetes.io/instance: testonly-placeholder-name
    app.kubernetes.io/version: "v1.0.0"
    app.kubernetes.io/managed-by: Helm
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component:  amd-gpu
      app.kubernetes.io/name:  amd-gpu
      app.kubernetes.io/part-of:  amd-gpu
      control-plane:  controller-manager
      app.kubernetes.io/name:  gpu-operator-charts
      app.kubernetes.io/instance: testonly-placeholder-name
  template:
    metadata:
      labels:
        app.kubernetes.io/component:  amd-gpu
        app.kubernetes.io/name:  amd-gpu
        app.kubernetes.io/part-of:  amd-gpu
        control-plane:  controller-manager
        app.kubernetes.io/name:  gpu-operator-charts
        app.kubernetes.io/instance: testonly-placeholder-name

This is generated from https://github.com/ROCm/gpu-operator/blob/v1.0.0/helm-charts/templates/deployment.yaml, and when inspecting the code I can see that the template has a hard coded value for app.kubernetes.io/name https://github.com/ROCm/gpu-operator/blob/v1.0.0/helm-charts/templates/deployment.yaml#L23 and also one generated from {{- include "helm-charts-k8s.selectorLabels" . | nindent 8 }} https://github.com/ROCm/gpu-operator/blob/v1.0.0/helm-charts/templates/_helpers.tpl#L48-L51

There are also other potential duplicate values, as I have not checked all values from the Helm chart.

Operating System

Ubuntu 22.04

CPU

Unsure (did this in our dev environment EC2 instance), likely not relevant in this issue

GPU

Not relevant

ROCm Version

ROCm 6.3.0

ROCm Component

No response

Steps to Reproduce

Generate the full YAML manifest of the Helm chart release, and validate the manifest using Kubeconform https://github.com/yannh/kubeconform. Kubeconform will return errors indicating duplicate label keys.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@yuxiang-he yuxiang-he changed the title [Issue]: Potentially duplicated label values present in object metadata [Issue]: Potentially duplicated labels present in object metadata Dec 18, 2024
@farshadghodsian
Copy link
Contributor

Thank you for reporting this. We have opened a bug ticket on our side to look into this.

@farshadghodsian farshadghodsian self-assigned this Dec 18, 2024
@yuxiang-he
Copy link
Author

Thanks for taking a look @farshadghodsian. Besides the duplicated labels, I think the objects need more consistent labelling, e.g. in the generated pod spec referenced above, it has

      app.kubernetes.io/component:  amd-gpu
      app.kubernetes.io/name:  amd-gpu
      app.kubernetes.io/part-of:  amd-gpu
      control-plane:  controller-manager
      app.kubernetes.io/name:  gpu-operator-charts
      app.kubernetes.io/instance: testonly-placeholder-name

The app.kubernetes.io/* labels and values are recommended to follow the guide here https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

@frzifus
Copy link

frzifus commented Dec 27, 2024

hey, I face the same issue:

---
apiVersion: v1
kind: Namespace
metadata:
  name: amd-gpu-plugin
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: amd-gpu-operator
  namespace: amd-gpu-plugin
spec:
  url: https://rocm.github.io/gpu-operator
  interval: 1h
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: amd-gpu-operator
  namespace: amd-gpu-plugin
spec:
  interval: 5m
  chart:
    spec:
      chart: gpu-operator-charts
      sourceRef:
        kind: HelmRepository
        name: amd-gpu-operator
        namespace: amd-gpu-plugin
      version: "*"
      # version: "1.0.0"
  values:
    kmm:
      enabled: false
    node-feature-discovery:
      enabled: false

Helm release status

$ kubectl get helmreleases.helm.toolkit.fluxcd.io amd-gpu-operator 
NAME               AGE    READY   STATUS
amd-gpu-operator   8m6s   False   Helm install failed for release amd-gpu-plugin/amd-gpu-operator with chart [email protected]: error while running post render on files: map[string]interface {}(nil): yaml: unmarshal errors:...

Logs

Last Helm logs:

2024-12-27T22:11:11.095720805Z: creating 1 resource(s)
2024-12-27T22:11:11.114025895Z: CustomResourceDefinition nodefeaturerules.nfd.k8s-sigs.io is already present. Skipping.
2024-12-27T22:11:11.114032559Z: creating 1 resource(s)
2024-12-27T22:11:11.159173819Z: CustomResourceDefinition modules.kmm.sigs.x-k8s.io is already present. Skipping.
2024-12-27T22:11:11.159182944Z: creating 1 resource(s)
2024-12-27T22:11:11.185852368Z: CustomResourceDefinition nodemodulesconfigs.kmm.sigs.x-k8s.io is already present. Skipping.
2024-12-27T22:11:11.185861738Z: creating 1 resource(s)
2024-12-27T22:11:11.229470018Z: CustomResourceDefinition modules.kmm.sigs.x-k8s.io is already present. Skipping.
2024-12-27T22:11:11.229478905Z: creating 1 resource(s)
2024-12-27T22:11:11.246415831Z: CustomResourceDefinition nodemodulesconfigs.kmm.sigs.x-k8s.io is already present. Skipping.
  Warning  InstallFailed  75s  helm-controller  Helm install failed for release amd-gpu-plugin/amd-gpu-operator with chart [email protected]: error while running post render on files: map[string]interface {}(nil): yaml: unmarshal errors:
  line 25: mapping key "app.kubernetes.io/name" already defined at line 22
  line 34: mapping key "app.kubernetes.io/name" already defined at line 31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants