-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H4HIP: CRD updating #379
base: main
Are you sure you want to change the base?
H4HIP: CRD updating #379
Conversation
eb8ecef
to
2b5d9d7
Compare
Signed-off-by: George Jenkins <[email protected]>
2b5d9d7
to
343cb9f
Compare
Its pretty common these days for crds to be in templates so they can be upgraded too. We should probably keep this behavior and align it with the /crds directory too. Maybe:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not worked through the specification, yet. But, I wanted to add thoughts to the material up front.
hips/hip-XXXX.md
Outdated
1. CRDs are a cluster-wide resource. | ||
Changes to cluster wide resources can (more easily) break applications beyond the scope of the chart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just beyond the scope of the instance of a chart but beyond the scope of the user who is installing the chart. You can have two users of a cluster who do not have knowledge of each other or their work. This is where breaking can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me word users into here. I loosely mean that "cluster-wide" by de facto must presumed to be multi-user
For 1., it is thought that Helm should not treat CRDs specially here. | ||
Helm will readily operate on many other cluster wide resources today: Cluster roles, priority classes, namespaces, etc. | ||
That the modification/removal of could easily cause breakage outside of the chart's release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helm is a package manager rather than a general purpose management tool for Kubernetes resources. It's a subtle but important difference. Here are some ways to think about it...
- In the profiles we cover applications specifically. Cluster operations is specifically out of scope.
- The definition (from wikipedia) for a package manager:
A package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner.
Namespaces were not able to be created when 3.0.0 came out. The philosophy is that applications should be installed in a namespace but its not up to Helm to manage those and those should be created first, possibly part of configuration management. The namespace creation was added to provide backwards compatibility to Helm v2 (which had it) because we got so many issues filed about it.
We have not considered Helm the right tool to manage all resources since it's targeted at applications and Kubernetes resources go beyond that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I disagree here. If a chart contains a namespace resource, Helm will happily template that namespace resource and install it into a cluster. Furthermore, when the user uninstalls the chart, Helm will delete the namespace, and the deletion will cascade to all resources in that namespace.
While, ideally, Helm might position itself as a "package manager", with the intent the users only create resources in the given target namespace. It doesn't do anything to prevent creating cross-namespace resources, or namespace resources themselves.
Similar for cluster-roles and cluster-role binding RBAC rules. Helm will happily install/uninstall these resources. Which could easily cause cross-application breakage, if something came to depend on those namespace rules in the interim.
The overall point is, Helm doesn't restrict other cross-namespace resources, but is very conservative with CRDs for being cross-namespace. Perhaps Helm should not delete any cluster-wide resource? (that is a different HIP :) ).
My aim is to relax Helm's approach to CRDs, so that users (chart operators) still have sufficient protection (ie. a rollback will recover). But less that todays "CRDs are cross-namespace, so Helm can't manage them" approach (since Helm will manage other cross-namespace resources).
In general, Helm as a package manager should not try to prempt unintended functional changes from a chart. | ||
Validating functional changes is well beyond the scope of a package manager. | ||
Helm should treat CRDs the same as other (cluster-wide) resources, where a Helm chart upgrade that causes unintended functional effects should be reverted (rolled back) by the user (chart operator). | ||
And as long as that rollback path exists, this is a suitable path for users to mitigate breaking functional changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts here....
- Chart authors create charts. Often, Application operators are entirely different people and they install or upgrade charts. Application operators often do not have expertise in k8s or the applications they are running. When an application operator has a problem, especially a severe one like data loss, they file issues in the Helm issue queue. We have experience this in the past which is one of the reasons Helm has been conservative. Responding to those issues and managing them is time consuming.
- Those who can install/update CRDs are sometimes not the same people installing/upgrading the chart. In the past there has been a separation. We should better understand the current state of this. Being able to extract the CRDs and send them to someone with access is useful. Being able to install/upgrade a chart when you don't have global resource access is helpful. Or has been. We need to understand this landscape change.
I remember meeting with Helm users who had tight controls on their clusters. They had many who could use their namespaces and few who could manage things at the cluster level. This shaped Helm v3 designs. For example, it's the reason Helm uses secrets instead of a CRD. Using a CRD would limit who could use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially a severe one like data loss
I want to call this out directly: This HIP specifically ensures Helm won't cause data loss with CRDs/CRs
For 1., there is a range. If a chart author produces a buggy chart (unrelated to CRDs), and an operator installs it. They may come to Helm and ask "why didn't my application work correctly?" This is unfortunate, as it is a chart not Helm problem. But their recourse is to simply helm rollback
. And there is not much (anything) Helm can do to prevent this situation (aside from good UX to help the user discover/diagnose the problem).
The goal is to treat CRDs similarly, if a chart author produces a new chart version with a new CRD version which is incompatible with the existing served/stored versions, we can simply consider the chart to be buggy, and the operator can rollback. The user may come to Helm, and like other issues, they would need to be directed to the chart author (in this case).
hips/hip-XXXX.md
Outdated
For 2., Data loss should be treated more carefully. | ||
As data loss can be irrevocable or require significant effort to restore. | ||
And especially, an application/chart operator should not expect a chart upgrade to cause data loss. | ||
Helm can prevent Data loss can be prevented by ensuring CRDs are "appended" only (with special exceptions in the specification below). An in particular, appending allows a rollback to (effectively) restore existing cluster state. | ||
(It should also be noted that Helm will today remove other resources which may cause data loss e.g. secrets, config maps, namespace, etc. A special, hard-coded, exception does exist for persistent volumes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data loss is different between CRDs and something like secrets. If I remove a secret it removes the one secret. This impacts that single case. If a CRD is deleted all the CRs are deleted. For example, you have user A with application instance of a chart. And you have user B in that same cluster with a separate instance of the same chart. These 2 users do not know about each other. If user A deletes a CRD, even unintentionally, it will remove the CRs for user B and cause data loss for user B. This is a very different surface area from something like deleting a secret.
We also know that some don't have backups and will make changes in production. When things go wrong, Helm devs will get issues to triage and deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should delete this last sentence, I'm not sure why I added it. The explicit intention of this HIP is to ensure Helm can not cause data loss with CRDs
In what circumstances do people actually need CRDs to be dynamic (with templating)? To my knowledge, CRDs tend to end up in the templates/ dir, solely so Helm will update them. Not because the the CRD structure needs to be adjusted. Examples/rationale for otherwise appreciated! (Generally, the goal here is to improve Helm's CRDs handling. Which doesn't mean all improvements need to be done at once. templating CRDs might be done as a future improvement) |
Signed-off-by: George Jenkins <[email protected]>
No description provided.