Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inbound only graceful draining #38430

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

keithmattix
Copy link
Contributor

@keithmattix keithmattix commented Feb 12, 2025

Trying #37873 again to get to the bottom of the flakes that caused us to need to revert. This got a lot more complex after #38333, so I may need @yanavlasov to help me out with the proper semantics

Commit Message:

Fixes https://github.com/envoyproxy/envoy/issues/35020. The inbound_only query param for the drain_listeners admin
endpoint doesn't work with the graceful query parameter. As a result,
outbound listeners will send connection: close headers to upstreams
which is undesired. This PR adds the ability for drain_manager to drain
in a single direction.

Additional Description:
Risk Level: Medium
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Keith Mattix II <[email protected]>
@keithmattix keithmattix changed the title Directional Draining feat: inbound only graceful draining Feb 12, 2025
@keithmattix
Copy link
Contributor Author

Can someone from @envoyproxy/envoy-mobile-triage help me with these Android test failures? The smoking gun appears to be errors like these:

java.lang.UnsatisfiedLinkError: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: undefined symbol: __atomic_load

I tried to add a link command to the jni bazel config but no dice. I've never touched Envoy native development, so I truly have no idea what I'm doing

@abeyad
Copy link
Contributor

abeyad commented Feb 13, 2025

java.lang.UnsatisfiedLinkError: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: undefined symbol: __atomic_load

You added the -latomic to libenvoy_jni.so, not to libenvoy_jni_with_test_extensions.so. Can you try adding it to

name = "libenvoy_jni_with_test_extensions.so",
? That said, it's unclear why you would get an std::atomic linker error with your change, as we've had std::atomic in Envoy code prior and no linker issues on Android.

@fredyw
Copy link
Member

fredyw commented Feb 13, 2025

java.lang.UnsatisfiedLinkError: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: undefined symbol: __atomic_load

You added the -latomic to libenvoy_jni.so, not to libenvoy_jni_with_test_extensions.so. Can you try adding it to

name = "libenvoy_jni_with_test_extensions.so",

? That said, it's unclear why you would get an std::atomic linker error with your change, as we've had std::atomic in Envoy code prior and no linker issues on Android.

That's what I thought, too but when I tested it in 40faf7e, it didn't help, though: https://github.com/envoyproxy/envoy/actions/runs/13312980039/job/37180020453

I'm also not able to reproduce this locally.

@fredyw
Copy link
Member

fredyw commented Feb 13, 2025

java.lang.UnsatisfiedLinkError: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: /mnt/engflow/worker/work/3/exec/bazel-out/k8-fastbuild/bin/test/java/io/envoyproxy/envoymobile/engine/testing/quic_test_server_test.runfiles/envoy_mobile/test/jni/libenvoy_jni_with_test_extensions.so: undefined symbol: __atomic_load

You added the -latomic to libenvoy_jni.so, not to libenvoy_jni_with_test_extensions.so. Can you try adding it to

name = "libenvoy_jni_with_test_extensions.so",

? That said, it's unclear why you would get an std::atomic linker error with your change, as we've had std::atomic in Envoy code prior and no linker issues on Android.

That's what I thought, too but when I tested it in 40faf7e, it didn't help, though: https://github.com/envoyproxy/envoy/actions/runs/13312980039/job/37180020453

I'm also not able to reproduce this locally.

Oh looks like it did help. I just need to link it with -latomic in a different shared library. Let me test it again.

Signed-off-by: Keith Mattix II <[email protected]>
Signed-off-by: Keith Mattix II <[email protected]>
@keithmattix
Copy link
Contributor Author

Android is now passing after integrating @fredyw's changes; thanks for the assist!

This is now ready for review

@keithmattix
Copy link
Contributor Author

/retest

Signed-off-by: Keith Mattix II <[email protected]>
Signed-off-by: Keith Mattix II <[email protected]>
Copy link
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test was flaky?

/wait

@@ -8,6 +8,7 @@
# leave room for compiler/linker.
# The number 3G is chosen heuristically to both support large VM and small VM with RBE.
# Startup options cannot be selected via config.
# TODO: Adding just to test android
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes android doesn't run in PRs by default; I wanted to get an approval on the code first before removing so there aren't surprises on main post merge

std::atomic<DrainPair> draining_{DrainPair{false, Network::DrainDirection::None}};
// A map of timers keyed by the direction that triggered the drain
std::map<Network::DrainDirection, Event::TimerPtr> drain_tick_timers_;
std::map<Network::DrainDirection, MonotonicTime> drain_deadlines_ = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a struct would be a better choice? Are the timer keys dynamic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we only expect a max of two keys (All and InboundOnly) at any time (and we have an assertion in the code to check).

@keithmattix
Copy link
Contributor Author

keithmattix commented Feb 14, 2025

Which test was flaky?

/wait

AdminDrainInboundOnlyIdempotent (or something similar). I believe I addressed it by adding the stopped_listener flatHashSet check into a callback function that runs on the main thread which should prevent the race conditions that led to the flaky test. Here's a failed run: https://github.com/envoyproxy/envoy/actions/runs/13172376287/job/36764947882

@keithmattix
Copy link
Contributor Author

@yanavlasov do you have any additional feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants