H4HIP: CRD updating #379

gjenkins8 · 2024-12-21T04:49:09Z

No description provided.

Signed-off-by: George Jenkins <[email protected]>

kfox1111 · 2024-12-21T13:44:58Z

Its pretty common these days for crds to be in templates so they can be upgraded too. We should probably keep this behavior and align it with the /crds directory too.

Maybe:

Render templates part 1. Render out any crds found, ignoring other templates. Treat the rendered crds as if in /crds, apply those. Render templates part 2. Render everything but the crds as normal

mattfarina

I've not worked through the specification, yet. But, I wanted to add thoughts to the material up front.

mattfarina · 2025-01-08T20:50:01Z

hips/hip-XXXX.md

+1. CRDs are a cluster-wide resource.
+   Changes to cluster wide resources can (more easily) break applications beyond the scope of the chart


It's not just beyond the scope of the instance of a chart but beyond the scope of the user who is installing the chart. You can have two users of a cluster who do not have knowledge of each other or their work. This is where breaking can happen.

Let me word users into here. I loosely mean that "cluster-wide" by de facto must presumed to be multi-user

mattfarina · 2025-01-08T21:23:15Z

hips/hip-XXXX.md

+For 1., it is thought that Helm should not treat CRDs specially here.
+Helm will readily operate on many other cluster wide resources today: Cluster roles, priority classes, namespaces, etc.
+That the modification/removal of could easily cause breakage outside of the chart's release.


Helm is a package manager rather than a general purpose management tool for Kubernetes resources. It's a subtle but important difference. Here are some ways to think about it...

In the profiles we cover applications specifically. Cluster operations is specifically out of scope.

The definition (from wikipedia) for a package manager:

A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.

Namespaces were not able to be created when 3.0.0 came out. The philosophy is that applications should be installed in a namespace but its not up to Helm to manage those and those should be created first, possibly part of configuration management. The namespace creation was added to provide backwards compatibility to Helm v2 (which had it) because we got so many issues filed about it.

We have not considered Helm the right tool to manage all resources since it's targeted at applications and Kubernetes resources go beyond that.

I think I disagree here. If a chart contains a namespace resource, Helm will happily template that namespace resource and install it into a cluster. Furthermore, when the user uninstalls the chart, Helm will delete the namespace, and the deletion will cascade to all resources in that namespace.

While, ideally, Helm might position itself as a "package manager", with the intent the users only create resources in the given target namespace. It doesn't do anything to prevent creating cross-namespace resources, or namespace resources themselves.

Similar for cluster-roles and cluster-role binding RBAC rules. Helm will happily install/uninstall these resources. Which could easily cause cross-application breakage, if something came to depend on those namespace rules in the interim.

The overall point is, Helm doesn't restrict other cross-namespace resources, but is very conservative with CRDs for being cross-namespace. Perhaps Helm should not delete any cluster-wide resource? (that is a different HIP :) ).

My aim is to relax Helm's approach to CRDs, so that users (chart operators) still have sufficient protection (ie. a rollback will recover). But less that todays "CRDs are cross-namespace, so Helm can't manage them" approach (since Helm will manage other cross-namespace resources).

mattfarina · 2025-01-08T21:35:03Z

hips/hip-XXXX.md

+In general, Helm as a package manager should not try to prempt unintended functional changes from a chart.
+Validating functional changes is well beyond the scope of a package manager.
+Helm should treat CRDs the same as other (cluster-wide) resources, where a Helm chart upgrade that causes unintended functional effects should be reverted (rolled back) by the user (chart operator).
+And as long as that rollback path exists, this is a suitable path for users to mitigate breaking functional changes.


Two thoughts here....

Chart authors create charts. Often, Application operators are entirely different people and they install or upgrade charts. Application operators often do not have expertise in k8s or the applications they are running. When an application operator has a problem, especially a severe one like data loss, they file issues in the Helm issue queue. We have experience this in the past which is one of the reasons Helm has been conservative. Responding to those issues and managing them is time consuming.

Those who can install/update CRDs are sometimes not the same people installing/upgrading the chart. In the past there has been a separation. We should better understand the current state of this. Being able to extract the CRDs and send them to someone with access is useful. Being able to install/upgrade a chart when you don't have global resource access is helpful. Or has been. We need to understand this landscape change.

I remember meeting with Helm users who had tight controls on their clusters. They had many who could use their namespaces and few who could manage things at the cluster level. This shaped Helm v3 designs. For example, it's the reason Helm uses secrets instead of a CRD. Using a CRD would limit who could use it.

especially a severe one like data loss

I want to call this out directly: This HIP specifically ensures Helm won't cause data loss with CRDs/CRs

For 1., there is a range. If a chart author produces a buggy chart (unrelated to CRDs), and an operator installs it. They may come to Helm and ask "why didn't my application work correctly?" This is unfortunate, as it is a chart not Helm problem. But their recourse is to simply helm rollback. And there is not much (anything) Helm can do to prevent this situation (aside from good UX to help the user discover/diagnose the problem).

The goal is to treat CRDs similarly, if a chart author produces a new chart version with a new CRD version which is incompatible with the existing served/stored versions, we can simply consider the chart to be buggy, and the operator can rollback. The user may come to Helm, and like other issues, they would need to be directed to the chart author (in this case).

mattfarina · 2025-01-08T21:39:02Z

hips/hip-XXXX.md

+For 2., Data loss should be treated more carefully.
+As data loss can be irrevocable or require significant effort to restore.
+And especially, an application/chart operator should not expect a chart upgrade to cause data loss.
+Helm can prevent Data loss can be prevented by ensuring CRDs are "appended" only (with special exceptions in the specification below). An in particular, appending allows a rollback to (effectively) restore existing cluster state.
+(It should also be noted that Helm will today remove other resources which may cause data loss e.g. secrets, config maps, namespace, etc. A special, hard-coded, exception does exist for persistent volumes)


Data loss is different between CRDs and something like secrets. If I remove a secret it removes the one secret. This impacts that single case. If a CRD is deleted all the CRs are deleted. For example, you have user A with application instance of a chart. And you have user B in that same cluster with a separate instance of the same chart. These 2 users do not know about each other. If user A deletes a CRD, even unintentionally, it will remove the CRs for user B and cause data loss for user B. This is a very different surface area from something like deleting a secret.

We also know that some don't have backups and will make changes in production. When things go wrong, Helm devs will get issues to triage and deal with.

I should delete this last sentence, I'm not sure why I added it. The explicit intention of this HIP is to ensure Helm can not cause data loss with CRDs

gjenkins8 · 2025-02-01T23:23:23Z

Its pretty common these days for crds to be in templates so they can be upgraded too. We should probably keep this behavior and align it with the /crds directory too.

Maybe:

Render templates part 1. Render out any crds found, ignoring other templates. Treat the rendered crds as if in /crds, apply those. Render templates part 2. Render everything but the crds as normal

In what circumstances do people actually need CRDs to be dynamic (with templating)? To my knowledge, CRDs tend to end up in the templates/ dir, solely so Helm will update them. Not because the the CRD structure needs to be adjusted. Examples/rationale for otherwise appreciated!

(Generally, the goal here is to improve Helm's CRDs handling. Which doesn't mean all improvements need to be done at once. templating CRDs might be done as a future improvement)

Signed-off-by: George Jenkins <[email protected]>

pull-request-size bot added the size/L label Dec 21, 2024

gjenkins8 force-pushed the hip4_crd_appending branch 2 times, most recently from eb8ecef to 2b5d9d7 Compare December 21, 2024 05:00

Initial draft

343cb9f

Signed-off-by: George Jenkins <[email protected]>

gjenkins8 force-pushed the hip4_crd_appending branch from 2b5d9d7 to 343cb9f Compare December 21, 2024 05:01

mattfarina reviewed Jan 8, 2025

View reviewed changes

review

836107b

Signed-off-by: George Jenkins <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H4HIP: CRD updating #379

H4HIP: CRD updating #379

gjenkins8 commented Dec 21, 2024

kfox1111 commented Dec 21, 2024

mattfarina left a comment

mattfarina Jan 8, 2025

gjenkins8 Jan 9, 2025

mattfarina Jan 8, 2025

gjenkins8 Jan 28, 2025 •

edited

Loading

mattfarina Jan 8, 2025

gjenkins8 Jan 28, 2025 •

edited

Loading

mattfarina Jan 8, 2025

gjenkins8 Jan 28, 2025 •

edited

Loading

gjenkins8 commented Feb 1, 2025

		1. CRDs are a cluster-wide resource.
		Changes to cluster wide resources can (more easily) break applications beyond the scope of the chart

H4HIP: CRD updating #379

Are you sure you want to change the base?

H4HIP: CRD updating #379

Conversation

gjenkins8 commented Dec 21, 2024

kfox1111 commented Dec 21, 2024

mattfarina left a comment

Choose a reason for hiding this comment

mattfarina Jan 8, 2025

Choose a reason for hiding this comment

gjenkins8 Jan 9, 2025

Choose a reason for hiding this comment

mattfarina Jan 8, 2025

Choose a reason for hiding this comment

gjenkins8 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

mattfarina Jan 8, 2025

Choose a reason for hiding this comment

gjenkins8 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

mattfarina Jan 8, 2025

Choose a reason for hiding this comment

gjenkins8 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

gjenkins8 commented Feb 1, 2025

gjenkins8 Jan 28, 2025 •

edited

Loading

gjenkins8 Jan 28, 2025 •

edited

Loading

gjenkins8 Jan 28, 2025 •

edited

Loading