Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark WaitForSidecar check as obsolete in Dapr .NET SDKs #1403

Open
WhitWaldo opened this issue Nov 6, 2024 · 14 comments
Open

Mark WaitForSidecar check as obsolete in Dapr .NET SDKs #1403

WhitWaldo opened this issue Nov 6, 2024 · 14 comments
Assignees
Labels
Milestone

Comments

@WhitWaldo
Copy link
Contributor

WhitWaldo commented Nov 6, 2024

Describe the feature

The DaprClient offers a method that enables the developer to wait for the sidecar to be ready before starting any operations, but Dapr.Workflow doesn't have any such method.

Perhaps it's worth looking at adding it during startup both as a method exposed to developers, but also which happens internally so that any attempt to do an actor operation is preempted until the app channel is live and then allowed to proceed.

After discussing this with the core team, we'll be marking WaitForSidecar as obsolete on the Dapr.Client SDK as it's not a method that developers should be using in their own applications as that's not its purpose. Rather, developers should call the SDK methods as they want and the runtime will queue and block them until it's ready to process each one - no waiting/readiness check necessary.

Release Note

RELEASE NOTE: UPDATE Developers should not use WaitForSidecar in their applications to wait for Dapr readiness

@WhitWaldo WhitWaldo added kind/enhancement New feature or request area/workflow labels Nov 6, 2024
@WhitWaldo
Copy link
Contributor Author

@olitomlinson Just verifying - should this just wait for sidecar readiness on /v1.0/healthz/outbound or wait for app channel as well on /v1.0/healthz?

@oising
Copy link

oising commented Nov 6, 2024

I implore you to revisit what consitudes a "ready" sidecar. The sidecar should really be using early parsing information from components to figure out what services comprise the readiness check. i.e. is there an actorStateStore: true configured, is there an app port configured, and then wait for the actor runtime to be ready, and for the app port probe to complete (for example.) Without this, race conditions abound and the waitforsidecarasync call is meaningless.

@WhitWaldo
Copy link
Contributor Author

That might be more of a larger ask from the Dapr runtime than what the SDK itself has access to. We've got the Health API with just shows the sidecar is up and running (or not) and that the app channel is set up.

I believe the app channel on the Health API only returns true if the app port is accessible (catching app port configured and waiting for the app port probe to complete). It'd be nice if it'd also fail to return true if the actor state store isn't configured (as I think the only services that rely on the app channel are actors and workflows), but it should ideally catch if actors aren't configured right or the actor state store not configured on the outbound check).

So while I get that it'd be neat for each separate service to potentially report liveness checks, I think most of them would be duplicative to the two we already have. I think as far as the SDK goes, we've got all the information we need.

It'd be nice if the health checks would report precisely what isn't validated as that could simplify debugging (e.g. the error you shared in Discord: The state store is not configured to use the actor runtime. Have you set the - name: actorStateStore value : "true" in your state store component file? instead of just in the runtime logs, but I don't know that there's necessarily a race condition involved by the SDKs waiting on app channel (where necessary) and outbound to indicate readiness.

@oising
Copy link

oising commented Nov 6, 2024

There's 100% a race condition whereby you can cause workflows and actor invocations to fail, as the sidecar initializes in parallel to the app, and the WaitForSidecarAsync does NOT act as a readiness check. It's more of a liveness check, to use k8s parlance.

@WhitWaldo
Copy link
Contributor Author

@oising Just following up here after having had a discussion with the Dapr runtime developers about it - the goal following the release of 1.15 is this should not be "handled" by the SDK at all. Rather, all requests to the runtime from the SDK will be blocked and queued until the runtime itself has determined readiness at which point it'll suddenly get responsive and chatty.

As such, don't expect any changes to the SDK behavior here as we'll plan to lean entirely on that effort.

@WhitWaldo
Copy link
Contributor Author

@marcduiker Thanks for pointing this out.

After chatting about this on today's call with @JoshVanL, the conclusion is that neither the user application nor the SDK should be using this endpoint as it's intended just for Kubernetes to perform health checks.

Rather, with 1.15, if the gRPC port can open to Dapr, requests should be sent and they'll stack there until Daprd is live and can respond to handle them and there should not be a pre-check prior to sending such messages. As such, with the release of 1.15, we'll be marking the WaitForSidecarAsync method on Dapr.Client as obsolete and removing its use through the SDK and will not be adding it to Dapr.Workflow.

@WhitWaldo WhitWaldo added this to the v1.15 milestone Feb 7, 2025
@WhitWaldo WhitWaldo self-assigned this Feb 7, 2025
@WhitWaldo WhitWaldo changed the title Add WaitForSidecar check in Workflow SDK Mark WaitForSidecar check as obsolete in Dapr .NET SDKs Feb 7, 2025
@olitomlinson
Copy link

olitomlinson commented Feb 7, 2025

@WhitWaldo Has the intention to deprecate WaitForSideCar been co-ordinated across all SDK maintainers of all languages? I ask because the main docs makes reference to this specific SDK method. If we're changing the guidance, it needs to be done at the docs level too and all SDK maintainers need to understand the direction of travel so that we don't create mixed messages in the community.

@JoshVanL Is the underlying HTTP API being deprecated too? v1.0/healthz/outbound ?

The underlying v1.0/healthz/outbound is NOT a readiness or liveness check in the traditional sense. It's simply a check to determine if the outbound components have been initialised or not. The WaitForSideCar SDK naming is unfortunately misleading, which has clearly created an expectation on users that this method can be used for validating the health of the entire sidecar, including workflows, which it can't.

A key use-case of this v1.0/healthz/outbound endpoint is so that while an App is initialising, it can retrieve secrets from Dapr (Dapr Secrets being classed as an outbound component) but before it has opened up a webserver to establish the app channel. This is a crucial workflow for many apps, as most folks don't open up their web servers for traffic until secrets have been retrieved and used to establish connections to databases and such.

On the assumption that I'm not missing anything (which is highly possible because I don't have the full context of what has been discussed between Whit and Josh) - An alternative to deprecating this SDK method, can I suggest we simply rename it to something which reflects it's actual purpose i.e. WaitForOutboundComponentsToBeReady

@WhitWaldo
Copy link
Contributor Author

WhitWaldo commented Feb 7, 2025

@olitomlinson We talked about it on today's maintainer/release call. While it wasn't clear how many of the languages currently expose this, it was well-understood that it should be deprecated on all of them and @alicejgibbons is going to update the documentation to make it clear that this isn't a check that's intended to be done by developers (to match the obsolete text).

The conversation started with an issue raised in that the startup check sort of turned into a circular check where the app never finishes initialization because it's waiting on Dapr to be ready and Dapr is never ready because it's waiting on initialization.

The API endpoints themselves aren't being deprecated so much as they just aren't being exposed through the SDKs.

We didn't talk about secrets and that's definitely worth considering here. I'll leave it to @JoshVanL to decide if that's something the SDKs should do a precheck of some sort for, but the net of our discussion was that the SDK shouldn't wait for actors and workflows to be ready - if the port is open to communicate with the runtime, it should send those requests and leave it to Dapr to block on them until it's ready to handle everything. The endpoint then shouldn't be exposed to users of the SDK to avoid unintended checks (like the original issue).

@olitomlinson
Copy link

We didn't talk about secrets and that's definitely worth considering here.

Yes, WaitForSidecars primary use-case is to allow apps to check that outbound components (aka Secrets) are initialised and can be used safely as part of App start-up. Discussions around deprecating WaitForSidecar must be done within the context of this use-case.

Getting this wrong could be disastrous - my teams depend on WaitForSideCar in Java and calling v1.0/healthz/outbound
HTTP API directly in languages which don't have an SDK, in order to safely read secrets at App start-up time.

Happy to be educated if I'm worrying over nothing here.

@WhitWaldo
Copy link
Contributor Author

I would invite @JoshVanL to jump in with any corrections in case I misunderstood our conversation this morning (as there's some confusion on the other thread already)...

To summarize (for my own sanity), the Health API has two endpoints:

  • /healthz - A 204 response means the Dapr port is available and the app channel is initialized
  • /healthz/outbound - A 204 response means the Dapr port is available and has nothing to do with the app channel

If the SDK is able to make a call to either endpoint successfully, it means the Dapr port is available (even before getting a response).

The call to /healthz isn't necessary with the change to the blocking behavior of the runtime because until the app channel is initialized, the runtime will capture and block any requests that are sent before it's ready. When it's ready, it'll process and send responses back. But it should not be expected that the SDK poll this endpoint first.

So then the check to /heatlhz/outbound isn't necessary either because if the SDK can make the outbound call to the Dapr runtime (e.g. it connects at all), the runtime will respond back when it's provisioned the resources it needs to do so (presumably there's some prioritization order). And if the call doesn't connect, there should be some sort of resiliency baked in on the SDK itself to retry the call.

But either way, a call to the health endpoint isn't necessary.

That said, I don't ever recall seeing that the SDK has any sort of inherent call retry functionality in place, so this might be an opportunity to add in places like secrets retrieval or service invocation.

@JoshVanL
Copy link
Contributor

Hello!

So to weigh in from my side; first thing is that we shouldn't deprecate the healthz endpoints on daprd itself- these are obviously useful/required for probing health in environments like kube. It should be the case however that apps shouldn't be probing these healthz endpoints themselves. To an app, daprd is simply an API service. The intricacies of order of operation of runtime resource initialization and the context of the dapr cluster more widely should never be a concern for an application- this is kinda the point with Dapr- abstract the complexity of distributed applications with easy to consume APIs. For an app, if daprd's port is open, it's ready to serve traffic. As has been mentioned, it's always daprd's responsibility to hold a request until it's internal dependencies have been resolved, or respond with an appropriate error if it can't service the request. If there are situations today where this isn't the case, this should be considered a bug.

I hate the idea of SDKs needing to know the implementation details of runtime, which are always (correctly) subject to change, much less this being surfaced and a concern to developers.

@alicejgibbons
Copy link

To be clear, @olitomlinson were just proposing putting a deprecation notices on the SDK healthcheck calls in order to prompt folks to stop using them in the upcoming Dapr version. Nothing will be actually removed in 1.15!

This started from noticing that workflow/actor apps failed when calling /healthz on startup. Calling /healthz/outbound should not have the same circular dependency failure issue so we can leave it un-deprecated if we decide but am still unclear if its actually needed to Josh's point.

@WhitWaldo
Copy link
Contributor Author

@alicejgibbons Do you have a standardized/preferred text to include in the obsolete message?

@olitomlinson
Copy link

Thanks @JoshVanL and @alicejgibbons

It should be the case however that apps shouldn't be probing these healthz endpoints themselves

I agree. Apps shouldn't need to take a dependency on the v1.0/healthz endpoint, but to be super clear on the history the v1.0/healthz/outbound endpoint was specifically created for Apps (not kubernetes) to take a dependency on so that outbound requests can be served before dapr is fully ready.

Anyway... given the proposed changes, and in the context of Secrets (other outbound components use-cases are valid too)

Desirable : Calling GetSecretAsync (or language equivalent) will block only until the outbound components are initialised (and regardless of if the inbound App channel is established or not) then this delivers a useable experience that doesn't compromise existing behaviour.

Undesirable : On the other hand, if a call to GetSecretAsync can't complete/return until App Channel is established, then this is not acceptable as it fundamentally breaks the behaviour of what users expect and need.

If I can get a definitive clarification that the desirable scenario I have described above represents the intended behaviour of this proposed change then I'm more than happy with deprecating WaitForSideCar (or language equivalent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants