helm · AustinAbro321 · Dec 6, 2024 · Dec 12, 2024 · Dec 12, 2024 · Dec 12, 2024
diff --git a/hips/hip-0999.md b/hips/hip-0999.md
@@ -0,0 +1,112 @@
+---
+hip: 9999
+title: "Wait With kstatus"
+authors: [ "@austinabro321" ]
+created: "2024-12-06"
+type: "feature"
+status: "draft"
+---
+
+## Abstract
+
+Currently the `--wait` flag on `helm install` and `helm upgrade` does not wait for all resources to be fully reconciled, and does not wait for custom resources (CRs) at all. By replacing the wait logic with [kstatus](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md), Helm will achieve more intuitive waits, while simplifying its code and documentation.
+
+## Motivation
+
+Certain workflows require custom resources to be ready. There is no way to tell Helm to wait for custom resources to be ready, so anyone with this requirement must write their own logic to wait for their custom resources. Kstatus solves this by waiting for custom resources to have [the ready condition](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md#the-ready-condition) set to true. 
+
+Certain workflows requires resources to be fully reconciled, which does happen in the current `--wait` logic. For example, Helm waits for all new pods in an upgraded deployment to be ready. However, Helm does not wait for the previous pods in that deployment to be removed. Since kstatus waits for all resources to be reconciled, the wait will not finish for a deployment until all of its new pods are ready and all of its old pods have been deleted. 
+
+By introducing kstatus we will lower user friction with the `--wait` flag. 
+
+## Rationale
+
+Leveraging a existing status management library maintained by the Kubernetes team will simplify the code and documentation that Helm needs to maintain and improve the functionality of `--wait`. 
+
+## Specification
+
+From a CLI user's perspective there will be no changes in how waits are called, they will still use the `--wait` flag.
+
+Kstatus does not output any logs. Helm will output a log message each second signalling that a resource is not ready. Helm will log one of the resources it's waiting on at a time as to not overwhelm users with output. This behavior will look similar to the current `--wait` logging.
+
+Kstatus can be used with either a poller or a watcher. The poller runs on a specified interval and only requires "list" RBAC permissions for polled resources. The watcher reacts to [watch events](https://github.com/kubernetes/kubernetes/blob/90a45563ae1bab5868ee888432fec9aac2f7f8b1/staging/src/k8s.io/apimachinery/pkg/watch/watch.go#L55-L61) and requires "list" and "watch" RBAC permissions. This proposal uses the watcher as it responds slightly faster when all resources are ready, and it is very likely that users applying or deleting resources will have "watch" permissions on their resources. However, if the additional RBAC permissions are deemed to cause potential issues the poller can be used instead.
+
+Any functions involving waits will be separated from the `kube.Interface` interface into the `kube.Waiter` interface. `kube.Waiter` will be embedded into `kube.Interface`. The client struct will embed the `Waiter` interface to allow calls to look like `client.Wait()` instead of `client.Waiter.Wait()`. `kube.New()` will accept a wait strategy to decide the wait implementation. Options will be either `HelmWaiter` or `statusWaiter`. `HelmWaiter` is the legacy implementation. The `statusWaiter` will not be public so that is kstatus is ever deprecated or replaced a new implementation can be used without changing the public SDK. 
+
+The new client will look like: 
+```go
+type Client struct {
+	Factory Factory
+	Log     func(string, ...interface{})
+	Namespace string
+	kubeClient *kubernetes.Clientset
+	Waiter
+}
+type WaitStrategy int
+const (
+	StatusWaiter WaitStrategy = iota
+	LegacyWaiter
+)
+func New(getter genericclioptions.RESTClientGetter, ws WaitStrategy) (*Client, error)
+```
+
+The waiter interface will look like: 
+```go
+type Waiter interface {
+	Wait(resources ResourceList, timeout time.Duration) error
+	WaitWithJobs(resources ResourceList, timeout time.Duration) error
+	WaitForDelete(resources ResourceList, timeout time.Duration) error
+}
+```
+
+`WaitAndGetCompletedPodPhase` is an exported function that is not called anywhere within the Helm repository. It will be removed. 
+
+`WatchUntilReady` is used only for hooks. It has custom wait logic different from the Helm 3 general logic. Ideally, this could be replaced with a regular `Wait()` call. If there is any historical context as to why this logic is the way it is, please share. 
+
+The Helm CLI will always use the `statusWaiter` implementation, if this is found to be insufficient during Helm 4 testing a new cli flag `wait-strategy` will be introduced with options `status` and `legacy` to allow usage of the `HelmWaiter`. If the `statusWaiter` is found to be sufficient the `HelmWaiter` will be deleted from the public SDK before Helm 4 is released. 
+
+The current Helm wait logic does not wait for paused deployments to be ready. This is because if `helm upgrade --wait` is run on a chart with paused deployments they will never be ready, see [#5789](https://github.com/helm/helm/pull/5789).
+
+## Backwards compatibility
+
+Waiting for custom resources and for reconciliation to complete for every resource could lead to charts timing out that weren't previously.
+
+The kstatus status watcher requires the "list" and "watch" RBAC permissions to watch a resource. The kstatus status poller and current Helm implementation only require "list" permissions for the resources they're watching. Kstatus and Helm require "list" permissions for some child resources. For instance, checking if a deployment is ready requires "list" permissions for the replicaset. There may be cases where the RBAC requirements for child resources differ between Kstatus and Helm, as an evaluation has not been conducted.
+
+Below is the minimal set needed to watch a deployment with the status watcher. This can be verified by following instructions in this repo: https://github.com/AustinAbro321/kstatus-rbac-test.
+```yaml
+rules:
+  - apiGroups: ["apps"]
+    resources: ["deployments"]
+    verbs: ["list", "watch"] # The status poller only requires "list" here
+  - apiGroups: ["apps"]
+    resources: ["replicasets"]
+    verbs: ["list"]
+```
+
+
+## Security implications
+
+Users will now need "watch" permissions on resources in their chart if `--wait` is used, assuming the status watcher is used over the status poller. 
+
+## How to teach this
+
+Replace the [existing wait documentation](https://helm.sh/docs/intro/using_helm/) by explaining that we use kstatus and pointing to the [kstatus documentation](https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md). This comes with the added benefit of not needing to maintain Helm specific wait documentation.
+
+## Reference implementation
+
+seen here - https://github.com/helm/helm/pull/13604
+
+## Rejected ideas
+
+TBD
+
+## Open issues
+
+[8661](https://github.com/helm/helm/issues/8661)
+
+## References
+
+existing wait documentation - https://helm.sh/docs/intro/using_helm/
+kstatus documentation - https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/README.md
+Zarf kstatus implementation - https://github.com/zarf-dev/zarf/blob/main/src/internal/healthchecks/healthchecks.go