Skip to content

gpu_operator_helm_chart_v1.1.0

Latest
Compare
Choose a tag to compare
@sajmera-pensando sajmera-pensando released this 04 Feb 01:57
b870c8d

GPU Operator v1.1.0 Release Notes

The GPU Operator v1.1.0 release adds support for Red Hat OpenShift versions 4.16 and 4.17. The AMD GPU Operator has gone through a rigourous validation process and is now certified for use on OpenShift. It can now be deployed via the Red Hat Catalog.

The latest AMD GPU Operator OLM Bundle for OpenShift is tagged with version v1.1.1 as the operator image has been updated to include a minor driver fix.

Release Highlights

  • The AMD GPU Operator has now been certified for use with Red Hat OpenShift v4.16 and v4.17
  • Updated documentation with installationa and configuration steps for Red Hat OpenShift

Platform Support

New Platform Support

  • Red Hat OpenShift 4.16-4.17
    • Supported features:
      • Driver management
      • Workload scheduling
      • Metrics monitoring
    • Requirements: Red Hat OpenShift version 4.16 or 4.17

Known Limitations

  1. Due to issue with KMM 2.2 deletion of DeviceConfig Custom Resource gets stuck in Red Hat OpenShift
    • Impact: Not able to delete the DeviceConfig Custom Resource if the node reboots during uninstall.
    • Affected Configurations: This issue only affects Red Hat OpenShift
    • Workaround: This issue will be fixed in the next release of KMM. For the time being you can use a previous version of KMM aside from 2.2 or manually remove the status from NMC:
    1. List all the NMC resources and pick up the correct NMC (there is one nmc per node, named the same as the node it related to).

      oc get nmc -A
    2. Edit the NMC.

      oc edit nmc <nmc name>
    3. Remove from NMC status for all the data related to your module and save. That should allow the module to be finally deleted.