Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odo dev port forward timeouts if app is not ready #5877

Closed
Tracked by #5848
vinny-sabatini opened this issue Jun 24, 2022 · 9 comments · Fixed by #6013
Closed
Tracked by #5848

odo dev port forward timeouts if app is not ready #5877

vinny-sabatini opened this issue Jun 24, 2022 · 9 comments · Fixed by #6013
Assignees
Labels
estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person kind/bug Categorizes issue or PR as related to a bug. priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)).

Comments

@vinny-sabatini
Copy link
Contributor

vinny-sabatini commented Jun 24, 2022

/kind bug

What versions of software are you using?

Operating System:
MacOS

Output of odo version:

$ odo version
odo v3.0.0-alpha3 (77ff2b89d)

Server: https://my.openshift.cluster:6443
Kubernetes: v1.23.5+3afdacb

How did you run odo exactly?

I started a new project using odo init. I used the Go starter project and added a sleep before starting the web server. I ran the odo dev command, and when the odo said it was running, I tried to access the endpoint before the duration of the sleep. I also tried to access the endpoint multiple times after the sleep.

The intention of the sleep is to simulate if the container takes some time to initialize and reach a ready state, which could be handled by a readiness probe if it were running in Kubernetes. I originally experienced this behavior with a Java Spring Boot application that took about 10 seconds to reach a ready state.

Actual behavior

The first request returned immediately with an empty reply from server. After the duration of the sleep, some requests were successful, but others would occasionally timeout with a connection reset by peer.

Expected behavior

  • I think odo should wait until the container is listening on the intended port before stating the application was ready
  • I would not expect intermittent connection timeouts when the application is running

Any logs, error output, etc?

Here is the odo output, note the first error is from when the first request came through (before the sleep duration completed), the second error is from an intermittent error:

$ odo dev
  __
 /  \__     Developing using the app Devfile
 \__/  \    Namespace: vinny-test-odo
 /  \__/    odo version: v3.0.0-alpha3
 \__/

↪ Deploying to the cluster in developer mode
 ✓  Waiting for Kubernetes resources [16s]
 ✓  Syncing files into the container [726ms]
 ✓  Building your application in container on cluster (command: build) [3s]
 •  Executing the application (command: run)  ...

Your application is now running on the cluster
 - Forwarding from 127.0.0.1:40001 -> 8080

Watching for changes in the current directory /Users/vinny/projects/odo-testing
Press Ctrl+c to exit `odo dev` and delete resources from the cluster

E0624 16:53:03.016599    9294 portforward.go:400] an error occurred forwarding 40001 -> 8080: error forwarding port 8080 to pod 3b77b1f3d378f912ec93eda422f7bd5fe99f63dd63466249ae41364edb59effe, uid : port forward into network namespace "/var/run/netns/ea2bbe59-8ea2-40f2-aca4-01228c6072f7": failed to connect to localhost:8080 inside namespace 3b77b1f3d378f912ec93eda422f7bd5fe99f63dd63466249ae41364edb59effe: dial tcp [::1]:8080: connect: connection refused
E0624 16:54:20.148247    9294 portforward.go:340] error creating error stream for port 40001 -> 8080: Timeout occurred

Here is the output from the failed curl request:

$ # First request before sleep
$ curl -vvv 127.0.0.1:40001
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 40001 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:40001
> User-Agent: curl/7.64.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection 0

$ # Intermittent timeout request
$ curl -vvv http://127.0.0.1:40001
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 40001 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:40001
> User-Agent: curl/7.64.1
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 24, 2022
@rm3l
Copy link
Member

rm3l commented Jun 27, 2022

Thanks for reporting this issue, @vinny-sabatini !

As discussed with the team today, we'll look into this when working on #5867, where we should have a better approach handling such cases.

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

@feloy
Copy link
Contributor

feloy commented Jun 27, 2022

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

@rm3l
Copy link
Member

rm3l commented Jun 28, 2022

Side note: Because we use the same Go library as kubectl, I tried to see how a raw kubectl port-forward against the pod would behave in the conditions you described, and noticed that kubectl currently exits right away if it receives the first incoming request on the local port while the command is sleeping. They had the exact same timeout issues you mentioned with the past versions: kubernetes/kubernetes#103526

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

Indeed, that would be a nice solution in this case.

@rm3l rm3l added the status/blocked Denotes an issue or PR that is blocked on something (e.g., issue/PR in different repo) label Jun 29, 2022
@valaparthvi
Copy link
Contributor

@rm3l Why is the status blocked on this?

@rm3l
Copy link
Member

rm3l commented Jul 7, 2022

@rm3l Why is the status blocked on this?

This issue will be fixed as part of #5867. I had updated the acceptance criteria of #5867 to take this into account. So yes, somehow blocked, as I intended to link the PR that would fix #5867 as fixing this one as well.

@kadel kadel added the priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)). label Jul 18, 2022
@cdrage cdrage added the estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person label Aug 4, 2022
@kadel
Copy link
Member

kadel commented Aug 4, 2022

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.

I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

This will solve only the initial startup problem.
There is also the case where the application could close the port (for example, when it gets restarted) if someone accesses the port-forwarded port before the application is up, the port-forwarding process will fail, and odo won't recover from this.

The readiness probe is not going to help with this.
We need to figure out how to keep port-forwarding up and working, or if that is not possible then try to restart it when it goes down automatically.

I think that users would be ok if the Empty reply from server is only a temporary state, and after the application is up, it starts working correctly.

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

@feloy
Copy link
Contributor

feloy commented Aug 8, 2022

So it seems we will have to double check if the port is really open, before to forward it, even if the program has been started.
I think using a readiness probe could be a solution: using a readiness probe, the pod will be marked as Ready only when the port is open, and we could start the forwarding only when the pod is Ready, and stop it as soon as it is not ready anymore.

This will solve only the initial startup problem. There is also the case where the application could close the port (for example, when it gets restarted) if someone accesses the port-forwarded port before the application is up, the port-forwarding process will fail, and odo won't recover from this.

The readiness probe is not going to help with this.

I don't understand why. The readiness probe will set the Pod as non-ready if the probe fails to get a response when the app is restarted, and odo dev will be able to detect this non-ready state and stop the port forwarding.

We need to figure out how to keep port-forwarding up and working, or if that is not possible then try to restart it when it goes down automatically.

I agree this could be a more robust solution. Looking for this warning message and restarting the port forwarding if it happens could help:

portforward.go:234] lost connection to pod

I think that users would be ok if the Empty reply from server is only a temporary state, and after the application is up, it starts working correctly.

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

@kadel
Copy link
Member

kadel commented Aug 8, 2022

I don't understand why. The readiness probe will set the Pod as non-ready if the probe fails to get a response when the app is restarted, and odo dev will be able to detect this non-ready state and stop the port forwarding.

The biggest issue with using a readiness probe will be that you can have only one. So we will have to come up with some "hack" to make one check test all ports.
In there might be an option for users to define their own Readiness or Liveliness probes in Devfile. In this case, we would have to figure out another "hack" to merge user-defined and odo probes.

The simple "restart-if-failed" solution might solve all of the problems that there currently are.

@feloy feloy self-assigned this Aug 8, 2022
@rm3l rm3l removed the status/blocked Denotes an issue or PR that is blocked on something (e.g., issue/PR in different repo) label Aug 8, 2022
@rm3l rm3l moved this to In Progress in odo v3-beta2 Aug 8, 2022
@vinny-sabatini
Copy link
Contributor Author

My understanding is that the main issue here is that odo can't recover when the port-forwarding fails. @vinny-sabatini, please correct me if I'm wrong.

Just wanted to confirm that was the behavior/issue we were running into and the intermittent errors after the port-forwarding failure made it tricky to continue developing without restarting the session.

@feloy feloy moved this from In Progress to For review in odo v3-beta2 Aug 9, 2022
Repository owner moved this from For review to Done in odo v3-beta2 Aug 11, 2022
@rm3l rm3l added the v3 label Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimated-size/S (5-10) Rough sizing for Epics. Less then one sprint of work for one person kind/bug Categorizes issue or PR as related to a bug. priority/High Important issue; should be worked on before any other issues (except priority/Critical issue(s)).
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants