Skip to content

Commit

Permalink
fix(destination): GetProfile requests targeting pods directly should …
Browse files Browse the repository at this point in the history
…return endpoint data for running (not necessarily ready) pods (#13557)

* fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods

Requiring Pods to pass readiness checks before allowing Pod to Pod communication disrupts communication in e.g. clustered systems which require Pods to communicate with each other prior to establishing ready state and allowing inbound traffic.

Relaxed the requirement and modified the workload watcher to only require that a Pod exists and is in Running phase.

Reproduced the issue with a test setup described in #13247.

Fixes #13247.

---------

Signed-off-by: Tuomo <[email protected]>
Co-authored-by: Alejandro Pedraza <[email protected]>
  • Loading branch information
tjorri and alpeb authored Jan 16, 2025
1 parent 767391e commit ba8a84c
Showing 1 changed file with 21 additions and 19 deletions.
40 changes: 21 additions & 19 deletions controller/api/destination/watcher/workload_watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -656,26 +656,37 @@ func (wp *workloadPublisher) unsubscribe(listener WorkloadUpdateListener) {
}

// updatePod creates an Address instance for the given pod, that is passed to
// the listener's Update() method, only if the pod's readiness state has
// the listener's Update() method, only if the pod's running state has
// changed. If the passed pod is nil, it means the pod (still referred to in
// wp.pod) has been deleted.
// Note that we care only about the running state instead of a stronger
// requirement on readiness state because this is used in the context of
// _endpoint_ profile subscriptions, as opposed to _service_ profile
// subscriptions. The former is used when calling GetProfile for a specific
// pod, usually when hitting instances of a StatefulSet, with IPs possibly
// derived from a headless service. An example of this is a Cassandra cluster,
// where a new node won't become ready until it's connected from other members
// of the cluster. For such connections to work inside the mesh, we need
// GetProfile to return the endpoint profile for the pod, even if it's not
// ready.
// See https://github.com/linkerd/linkerd2/issues/13247
func (wp *workloadPublisher) updatePod(pod *corev1.Pod) {
wp.mu.Lock()
defer wp.mu.Unlock()

// pod wasn't ready or there was no backing pod - check if passed pod is ready
// pod wasn't running or there was no backing pod - check if passed pod is running
if wp.addr.Pod == nil {
if pod == nil {
wp.log.Trace("Pod deletion event already consumed - ignore")
return
}

if !isRunningAndReady(pod) {
wp.log.Tracef("Pod %s.%s not ready - ignore", pod.Name, pod.Namespace)
if !isRunning(pod) {
wp.log.Tracef("Pod %s.%s not running - ignore", pod.Name, pod.Namespace)
return
}

wp.log.Debugf("Pod %s.%s became ready", pod.Name, pod.Namespace)
wp.log.Debugf("Pod %s.%s started running", pod.Name, pod.Namespace)
wp.addr.Pod = pod

// Fill in ownership.
Expand Down Expand Up @@ -705,9 +716,9 @@ func (wp *workloadPublisher) updatePod(pod *corev1.Pod) {
return
}

// backing pod becoming unready or getting deleted
if pod == nil || !isRunningAndReady(pod) {
wp.log.Debugf("Pod %s.%s deleted or it became unready - remove", wp.addr.Pod.Name, wp.addr.Pod.Namespace)
// backing pod stopped running or getting deleted
if pod == nil || !isRunning(pod) {
wp.log.Debugf("Pod %s.%s deleted or it stopped running - remove", wp.addr.Pod.Name, wp.addr.Pod.Namespace)
wp.addr.Pod = nil
wp.addr.OwnerKind = ""
wp.addr.OwnerName = ""
Expand Down Expand Up @@ -828,15 +839,6 @@ func isNamedInExternalWorkload(pr string, ew *ext.ExternalWorkload) (int32, bool
return 0, false
}

func isRunningAndReady(pod *corev1.Pod) bool {
if pod == nil || pod.Status.Phase != corev1.PodRunning {
return false
}
for _, condition := range pod.Status.Conditions {
if condition.Type == corev1.PodReady && condition.Status == corev1.ConditionTrue {
return true
}
}

return false
func isRunning(pod *corev1.Pod) bool {
return pod != nil && pod.Status.Phase == corev1.PodRunning
}

0 comments on commit ba8a84c

Please sign in to comment.