-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark WaitForSidecar check as obsolete in Dapr .NET SDKs #1403
Comments
@olitomlinson Just verifying - should this just wait for sidecar readiness on |
I implore you to revisit what consitudes a "ready" sidecar. The sidecar should really be using early parsing information from components to figure out what services comprise the readiness check. i.e. is there an actorStateStore: true configured, is there an app port configured, and then wait for the actor runtime to be ready, and for the app port probe to complete (for example.) Without this, race conditions abound and the waitforsidecarasync call is meaningless. |
That might be more of a larger ask from the Dapr runtime than what the SDK itself has access to. We've got the Health API with just shows the sidecar is up and running (or not) and that the app channel is set up. I believe the app channel on the Health API only returns true if the app port is accessible (catching app port configured and waiting for the app port probe to complete). It'd be nice if it'd also fail to return true if the actor state store isn't configured (as I think the only services that rely on the app channel are actors and workflows), but it should ideally catch if actors aren't configured right or the actor state store not configured on the outbound check). So while I get that it'd be neat for each separate service to potentially report liveness checks, I think most of them would be duplicative to the two we already have. I think as far as the SDK goes, we've got all the information we need. It'd be nice if the health checks would report precisely what isn't validated as that could simplify debugging (e.g. the error you shared in Discord: |
There's 100% a race condition whereby you can cause workflows and actor invocations to fail, as the sidecar initializes in parallel to the app, and the WaitForSidecarAsync does NOT act as a readiness check. It's more of a liveness check, to use k8s parlance. |
@oising Just following up here after having had a discussion with the Dapr runtime developers about it - the goal following the release of 1.15 is this should not be "handled" by the SDK at all. Rather, all requests to the runtime from the SDK will be blocked and queued until the runtime itself has determined readiness at which point it'll suddenly get responsive and chatty. As such, don't expect any changes to the SDK behavior here as we'll plan to lean entirely on that effort. |
@marcduiker Thanks for pointing this out. After chatting about this on today's call with @JoshVanL, the conclusion is that neither the user application nor the SDK should be using this endpoint as it's intended just for Kubernetes to perform health checks. Rather, with 1.15, if the gRPC port can open to Dapr, requests should be sent and they'll stack there until Daprd is live and can respond to handle them and there should not be a pre-check prior to sending such messages. As such, with the release of 1.15, we'll be marking the |
@WhitWaldo Has the intention to deprecate @JoshVanL Is the underlying HTTP API being deprecated too? The underlying A key use-case of this On the assumption that I'm not missing anything (which is highly possible because I don't have the full context of what has been discussed between Whit and Josh) - An alternative to deprecating this SDK method, can I suggest we simply rename it to something which reflects it's actual purpose i.e. |
@olitomlinson We talked about it on today's maintainer/release call. While it wasn't clear how many of the languages currently expose this, it was well-understood that it should be deprecated on all of them and @alicejgibbons is going to update the documentation to make it clear that this isn't a check that's intended to be done by developers (to match the obsolete text). The conversation started with an issue raised in that the startup check sort of turned into a circular check where the app never finishes initialization because it's waiting on Dapr to be ready and Dapr is never ready because it's waiting on initialization. The API endpoints themselves aren't being deprecated so much as they just aren't being exposed through the SDKs. We didn't talk about secrets and that's definitely worth considering here. I'll leave it to @JoshVanL to decide if that's something the SDKs should do a precheck of some sort for, but the net of our discussion was that the SDK shouldn't wait for actors and workflows to be ready - if the port is open to communicate with the runtime, it should send those requests and leave it to Dapr to block on them until it's ready to handle everything. The endpoint then shouldn't be exposed to users of the SDK to avoid unintended checks (like the original issue). |
Yes, Getting this wrong could be disastrous - my teams depend on Happy to be educated if I'm worrying over nothing here. |
I would invite @JoshVanL to jump in with any corrections in case I misunderstood our conversation this morning (as there's some confusion on the other thread already)... To summarize (for my own sanity), the Health API has two endpoints:
If the SDK is able to make a call to either endpoint successfully, it means the Dapr port is available (even before getting a response). The call to So then the check to But either way, a call to the health endpoint isn't necessary. That said, I don't ever recall seeing that the SDK has any sort of inherent call retry functionality in place, so this might be an opportunity to add in places like secrets retrieval or service invocation. |
Hello! So to weigh in from my side; first thing is that we shouldn't deprecate the healthz endpoints on daprd itself- these are obviously useful/required for probing health in environments like kube. It should be the case however that apps shouldn't be probing these healthz endpoints themselves. To an app, daprd is simply an API service. The intricacies of order of operation of runtime resource initialization and the context of the dapr cluster more widely should never be a concern for an application- this is kinda the point with Dapr- abstract the complexity of distributed applications with easy to consume APIs. For an app, if daprd's port is open, it's ready to serve traffic. As has been mentioned, it's always daprd's responsibility to hold a request until it's internal dependencies have been resolved, or respond with an appropriate error if it can't service the request. If there are situations today where this isn't the case, this should be considered a bug. I hate the idea of SDKs needing to know the implementation details of runtime, which are always (correctly) subject to change, much less this being surfaced and a concern to developers. |
To be clear, @olitomlinson were just proposing putting a deprecation notices on the SDK healthcheck calls in order to prompt folks to stop using them in the upcoming Dapr version. Nothing will be actually removed in 1.15! This started from noticing that workflow/actor apps failed when calling /healthz on startup. Calling /healthz/outbound should not have the same circular dependency failure issue so we can leave it un-deprecated if we decide but am still unclear if its actually needed to Josh's point. |
@alicejgibbons Do you have a standardized/preferred text to include in the obsolete message? |
Thanks @JoshVanL and @alicejgibbons
I agree. Apps shouldn't need to take a dependency on the Anyway... given the proposed changes, and in the context of Secrets (other outbound components use-cases are valid too) Desirable : Calling Undesirable : On the other hand, if a call to If I can get a definitive clarification that the desirable scenario I have described above represents the intended behaviour of this proposed change then I'm more than happy with deprecating |
Describe the feature
The DaprClient offers a method that enables the developer to wait for the sidecar to be ready before starting any operations, but Dapr.Workflow doesn't have any such method.
Perhaps it's worth looking at adding it during startup both as a method exposed to developers, but also which happens internally so that any attempt to do an actor operation is preempted until the app channel is live and then allowed to proceed.After discussing this with the core team, we'll be marking
WaitForSidecar
as obsolete on theDapr.Client
SDK as it's not a method that developers should be using in their own applications as that's not its purpose. Rather, developers should call the SDK methods as they want and the runtime will queue and block them until it's ready to process each one - no waiting/readiness check necessary.Release Note
RELEASE NOTE: UPDATE Developers should not use
WaitForSidecar
in their applications to wait for Dapr readinessThe text was updated successfully, but these errors were encountered: