odo dev port forward timeouts if app is not ready #5877

vinny-sabatini · 2022-06-24T23:02:41Z

/kind bug

What versions of software are you using?

Operating System:
MacOS

Output of odo version:

$ odo version
odo v3.0.0-alpha3 (77ff2b89d)

Server: https://my.openshift.cluster:6443
Kubernetes: v1.23.5+3afdacb

How did you run odo exactly?

I started a new project using odo init. I used the Go starter project and added a sleep before starting the web server. I ran the odo dev command, and when the odo said it was running, I tried to access the endpoint before the duration of the sleep. I also tried to access the endpoint multiple times after the sleep.

The intention of the sleep is to simulate if the container takes some time to initialize and reach a ready state, which could be handled by a readiness probe if it were running in Kubernetes. I originally experienced this behavior with a Java Spring Boot application that took about 10 seconds to reach a ready state.

Actual behavior

The first request returned immediately with an empty reply from server. After the duration of the sleep, some requests were successful, but others would occasionally timeout with a connection reset by peer.

Expected behavior

I think odo should wait until the container is listening on the intended port before stating the application was ready
I would not expect intermittent connection timeouts when the application is running

Any logs, error output, etc?

Here is the odo output, note the first error is from when the first request came through (before the sleep duration completed), the second error is from an intermittent error:

$ odo dev
  __
 /  \__     Developing using the app Devfile
 \__/  \    Namespace: vinny-test-odo
 /  \__/    odo version: v3.0.0-alpha3
 \__/

↪ Deploying to the cluster in developer mode
 ✓  Waiting for Kubernetes resources [16s]
 ✓  Syncing files into the container [726ms]
 ✓  Building your application in container on cluster (command: build) [3s]
 •  Executing the application (command: run)  ...

Your application is now running on the cluster
 - Forwarding from 127.0.0.1:40001 -> 8080

Watching for changes in the current directory /Users/vinny/projects/odo-testing
Press Ctrl+c to exit `odo dev` and delete resources from the cluster

E0624 16:53:03.016599    9294 portforward.go:400] an error occurred forwarding 40001 -> 8080: error forwarding port 8080 to pod 3b77b1f3d378f912ec93eda422f7bd5fe99f63dd63466249ae41364edb59effe, uid : port forward into network namespace "/var/run/netns/ea2bbe59-8ea2-40f2-aca4-01228c6072f7": failed to connect to localhost:8080 inside namespace 3b77b1f3d378f912ec93eda422f7bd5fe99f63dd63466249ae41364edb59effe: dial tcp [::1]:8080: connect: connection refused
E0624 16:54:20.148247    9294 portforward.go:340] error creating error stream for port 40001 -> 8080: Timeout occurred

Here is the output from the failed curl request:

$ # First request before sleep
$ curl -vvv 127.0.0.1:40001
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 40001 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:40001
> User-Agent: curl/7.64.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection 0

$ # Intermittent timeout request
$ curl -vvv http://127.0.0.1:40001
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 40001 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:40001
> User-Agent: curl/7.64.1
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

The text was updated successfully, but these errors were encountered:

rm3l · 2022-06-27T15:07:21Z

Thanks for reporting this issue, @vinny-sabatini !

As discussed with the team today, we'll look into this when working on #5867, where we should have a better approach handling such cases.

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

feloy · 2022-06-27T15:54:33Z

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

rm3l · 2022-06-28T07:04:48Z

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

Indeed, that would be a nice solution in this case.

valaparthvi · 2022-07-07T10:56:37Z

@rm3l Why is the status blocked on this?

rm3l · 2022-07-07T11:26:42Z

@rm3l Why is the status blocked on this?

This issue will be fixed as part of #5867. I had updated the acceptance criteria of #5867 to take this into account. So yes, somehow blocked, as I intended to link the PR that would fix #5867 as fixing this one as well.

kadel · 2022-08-04T19:06:14Z

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

This will solve only the initial startup problem.
There is also the case where the application could close the port (for example, when it gets restarted) if someone accesses the port-forwarded port before the application is up, the port-forwarding process will fail, and odo won't recover from this.

The readiness probe is not going to help with this.
We need to figure out how to keep port-forwarding up and working, or if that is not possible then try to restart it when it goes down automatically.

I think that users would be ok if the Empty reply from server is only a temporary state, and after the application is up, it starts working correctly.

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

feloy · 2022-08-08T08:53:57Z

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.
I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

This will solve only the initial startup problem. There is also the case where the application could close the port (for example, when it gets restarted) if someone accesses the port-forwarded port before the application is up, the port-forwarding process will fail, and odo won't recover from this.

The readiness probe is not going to help with this.

I don't understand why. The readiness probe will set the Pod as non-ready if the probe fails to get a response when the app is restarted, and odo dev will be able to detect this non-ready state and stop the port forwarding.

We need to figure out how to keep port-forwarding up and working, or if that is not possible then try to restart it when it goes down automatically.

I agree this could be a more robust solution. Looking for this warning message and restarting the port forwarding if it happens could help:

portforward.go:234] lost connection to pod

I think that users would be ok if the Empty reply from server is only a temporary state, and after the application is up, it starts working correctly.

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

kadel · 2022-08-08T11:04:00Z

I don't understand why. The readiness probe will set the Pod as non-ready if the probe fails to get a response when the app is restarted, and odo dev will be able to detect this non-ready state and stop the port forwarding.

The biggest issue with using a readiness probe will be that you can have only one. So we will have to come up with some "hack" to make one check test all ports.
In there might be an option for users to define their own Readiness or Liveliness probes in Devfile. In this case, we would have to figure out another "hack" to merge user-defined and odo probes.

The simple "restart-if-failed" solution might solve all of the problems that there currently are.

vinny-sabatini · 2022-08-09T01:48:17Z

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

Just wanted to confirm that was the behavior/issue we were running into and the intermittent errors after the port-forwarding failure made it tricky to continue developing without restarting the session.

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 24, 2022

rm3l added this to odo v3-beta2 Jun 27, 2022

This was referenced Jun 27, 2022

[Epic] Address user feedback #5848

Closed

odo dev: react to changes as soon as possible #5867

Closed

rm3l added the linked-to-epic label Jun 27, 2022

rm3l added the status/blocked Denotes an issue or PR that is blocked on something (e.g., issue/PR in different repo) label Jun 29, 2022

feloy mentioned this issue Jul 12, 2022

odo dev: react to changes as soon as possible #5933

Merged

8 tasks

kadel added the priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). label Jul 18, 2022

cdrage added the estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person label Aug 4, 2022

feloy self-assigned this Aug 8, 2022

feloy mentioned this issue Aug 8, 2022

Restart port forwarding on failure #6013

Merged

3 tasks

rm3l removed the status/blocked Denotes an issue or PR that is blocked on something (e.g., issue/PR in different repo) label Aug 8, 2022

rm3l moved this to In Progress in odo v3-beta2 Aug 8, 2022

feloy moved this from In Progress to For review in odo v3-beta2 Aug 9, 2022

openshift-merge-robot closed this as completed in #6013 Aug 11, 2022

Repository owner moved this from For review to Done in odo v3-beta2 Aug 11, 2022

rm3l added the v3 label Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

odo dev port forward timeouts if app is not ready #5877

odo dev port forward timeouts if app is not ready #5877

vinny-sabatini commented Jun 24, 2022 •

edited

Loading

rm3l commented Jun 27, 2022

feloy commented Jun 27, 2022 •

edited

Loading

rm3l commented Jun 28, 2022

valaparthvi commented Jul 7, 2022

rm3l commented Jul 7, 2022

kadel commented Aug 4, 2022 •

edited

Loading

feloy commented Aug 8, 2022 •

edited

Loading

kadel commented Aug 8, 2022

vinny-sabatini commented Aug 9, 2022

odo dev port forward timeouts if app is not ready #5877

odo dev port forward timeouts if app is not ready #5877

Comments

vinny-sabatini commented Jun 24, 2022 • edited Loading

What versions of software are you using?

How did you run odo exactly?

Actual behavior

Expected behavior

Any logs, error output, etc?

rm3l commented Jun 27, 2022

feloy commented Jun 27, 2022 • edited Loading

rm3l commented Jun 28, 2022

valaparthvi commented Jul 7, 2022

rm3l commented Jul 7, 2022

kadel commented Aug 4, 2022 • edited Loading

feloy commented Aug 8, 2022 • edited Loading

kadel commented Aug 8, 2022

vinny-sabatini commented Aug 9, 2022

vinny-sabatini commented Jun 24, 2022 •

edited

Loading

feloy commented Jun 27, 2022 •

edited

Loading

kadel commented Aug 4, 2022 •

edited

Loading

feloy commented Aug 8, 2022 •

edited

Loading