Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traffic stealing/mirroring/http filtering is flakey #2942

Open
aviramha opened this issue Nov 28, 2024 · 8 comments
Open

traffic stealing/mirroring/http filtering is flakey #2942

aviramha opened this issue Nov 28, 2024 · 8 comments

Comments

@aviramha
Copy link
Member

aviramha commented Nov 28, 2024

Local: nixOS + nc (tried also python script to see if it's different and also users' app) Remote: kops cluster running on AWS (not EKS) with calico CNI (vxlan) + Istio mesh It seems that the agent steals/mirrors the connection start but not the data itself for some reason.

Calico configuration from kops:

 calico:
      encapsulationMode: vxlan
      mtu: 8951
      typhaReplicas: 3

K8s version: v1.27.9

Copy link

linear bot commented Nov 28, 2024

@aviramha
Copy link
Member Author

aviramha commented Dec 1, 2024

Happened to another user on AWS EKS cluster version 1.30, with C# app
Happens on VSCode, doesn't happen when in Rider (WTF?)
macOS

@bradleyquinn140
Copy link

Due to a different issue, I switched to ephemeral agent mode. Interestingly this seems to help the issue I was having stealing or mirroring traffic to our GRPC servers. It now seems to forward some requests on to my local, however only the health check requests seem to be successful:

{"level":"info","ts":1733723856.831551,"caller":"grpc/grpc.go:140","msg":"finished unary call with code OK","service":"customer-grpc","grpc.start_time":"2024-12-09T16:57:36+11:00","grpc.request.deadline":"2024-12-09T16:57:37+11:00","system":"grpc","span.kind":"server","grpc.service":"grpc.health.v1.Health","grpc.method":"Check","grpc.code":"OK","grpc.time_ms":0.02,"TraceID":"48e3152fde975e2ff18417d1f4a96af9","SpanID":"1a1388db26fd29f4"}

Before the first successful regular GRPC request, this occurs:

Proxy error, connectivity issue or a bug.
                    Please report it to us on https://github.com/metalbear-co/mirrord/issues/new?assignees=&labels=bug&projects=&template=bug_report.yml
                    You can find the `mirrord-intproxy` logs in /tmp.
                    connection closed
signal: killed

intproxy:

  2024-12-09T05:57:26.864741Z  INFO mirrord_intproxy::layer_initializer: new session, process_info: ProcessInfo { pid: 145600, name: "main", cmdline: ["/tmp/go-build3109359887/b001/exe/main"], loaded: true }
    at mirrord/intproxy/src/layer_initializer.rs:63

  2024-12-09T05:57:54.571918Z  WARN mirrord_intproxy: agent log: [e12454f2-0436-4ab3-9cf3-052daf3fdb8f] mirrord-agent failed: agent was terminated: agent failed ping pong check. 1/2 mirrord-agents alive
    at mirrord/intproxy/src/lib.rs:319

  2024-12-09T05:57:54.799630Z  WARN mirrord_intproxy: agent log: [514a0ce2-bf8d-475d-95b9-4b2c8c763806] mirrord-agent failed: agent was terminated: agent failed ping pong check. 0/2 mirrord-agents alive
    at mirrord/intproxy/src/lib.rs:319

  2024-12-09T05:57:54.825465Z ERROR mirrord_operator::client::conn_wrapper: Operator connection failed, error: invalid message: Close(None)
    at mirrord/operator/src/client/conn_wrapper.rs:55

  2024-12-09T05:57:54.825507Z ERROR mirrord_intproxy::agent_conn: failed to receive message from the agent, inner task down
    at mirrord/intproxy/src/agent_conn.rs:226

  2024-12-09T05:57:54.825562Z ERROR mirrord::internal_proxy: Internal proxy encountered an error, exiting, error: Main internal proxy logic failed: agent closed connection with error: no mirrord-agent is alive
    at mirrord/cli/src/internal_proxy.rs:149


The operator logs this

2024-12-09T05:57:54.561909Z ERROR operator_context::agent::connection: Timeout while waiting for agent pong                                                              │
│     at operator/context/src/agent/connection.rs:234 on ThreadId(2)                                                                                                         │
│                                                                                                                                                                            │
│   2024-12-09T05:57:54.789740Z ERROR operator_context::agent::connection: Timeout while waiting for agent pong                                                              │
│     at operator/context/src/agent/connection.rs:234 on ThreadId(2)                                                                                                         │
│                                                                                                                                                                            │
│   2024-12-09T05:57:54.816243Z ERROR operator_proxy::routing::router: error: no mirrord-agent is alive                                                                      │
│     at operator/proxy/src/routing/router.rs:206 on ThreadId(2)                                                                                                             │
│                                                                                                                                                                            │
│   2024-12-09T05:57:54.816269Z ERROR operator_proxy::routing::router: [2DD4CD1A4B224F6E: osc6pZio80V3swmDuRDXaqBOUsBXt8z/eo4L5Bx6pMY bradleyq/u-h3a6z54eln@nixos (30.86181s │
│     at operator/proxy/src/routing/router.rs:790 on ThreadId(2)                                                                                                             │
│                                                                                                                                                                            │
│   2024-12-09T05:57:54.816425Z  INFO operator_proxy::session: Session End, client_user: u-h3a6z54eln, client_name: bradleyq, client_hostname: nixos, client_id: "osc6pZio80 │
│     at operator/proxy/src/session.rs:177 on ThreadId(2)  

And the agent (debug). https://gist.github.com/bradleyquinn140/d150786131e85ed1c7f8ff27c6b3fab8. When I did trace it logged significantly more, which I can do if that is valuable.

@bradleyquinn140
Copy link

Ugh, now after doing nothing but restarting the pods it's back to the old behavior I was getting 😢

Copy link
Member Author

We noticed using kubectl debug that the iptables creation in the nat table was delayed but once rules are in place the stealing works

Copy link
Member Author

Seems that rules keep getting removed and readded somehow

@aviramha
Copy link
Member Author

I'd have to guess it's a calico issue because mirrord doesn't have such logic and neither does istio (or have access to do it) - waiting for @bradleyquinn140 to see if upgrading Calico helps

@bradleyquinn140
Copy link

Last night I triggered our dev cluster to update Calico, was not successful, a few breaking changes.
I figured it may happen since kops is only using calico v3.27.3 by default on the latest kops version.

Going to try a similar thing in a new cluster to see if I can re-produce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants