-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: only enable quinn-udp fast-apple-datapath on arm #2427
base: main
Are you sure you want to change the base?
Conversation
4f94edf
to
3b8f248
Compare
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to b6e4cfc. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
This is odd, because IIRC these syscalls predate Catalina by years. |
Thanks @larseggert for the info. All suspicions for now. I just know it is related to Neqo Might as well be a syscall from the (Thus far I have no other way to reproduce than Mozilla's CI, a single run take ~6h, and logs have not been helpful.) I will continue to debug this. |
3b8f248
to
f53d539
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2427 +/- ##
==========================================
- Coverage 95.32% 95.28% -0.04%
==========================================
Files 114 114
Lines 36868 37111 +243
Branches 36868 37111 +243
==========================================
+ Hits 35145 35363 +218
- Misses 1719 1742 +23
- Partials 4 6 +2 ☔ View full report in Codecov by Sentry. |
For the record, Mozilla CI:
|
Benchmark resultsPerformance differences relative to b6e4cfc. decode 4096 bytes, mask ff: No change in performance detected.time: [10.855 µs 10.886 µs 10.926 µs] change: [-0.4051% +0.1434% +0.7503%] (p = 0.66 > 0.05) decode 1048576 bytes, mask ff: No change in performance detected.time: [3.1285 ms 3.1377 ms 3.1482 ms] change: [-0.3673% +0.0949% +0.5487%] (p = 0.67 > 0.05) decode 4096 bytes, mask 7f: No change in performance detected.time: [17.615 µs 17.652 µs 17.696 µs] change: [-1.1465% -0.4724% +0.1059%] (p = 0.16 > 0.05) decode 1048576 bytes, mask 7f: No change in performance detected.time: [5.4007 ms 5.4140 ms 5.4292 ms] change: [-0.4491% -0.0948% +0.2608%] (p = 0.62 > 0.05) decode 4096 bytes, mask 3f: No change in performance detected.time: [6.6813 µs 6.7256 µs 6.7734 µs] change: [-0.3220% +0.5041% +1.3374%] (p = 0.25 > 0.05) decode 1048576 bytes, mask 3f: No change in performance detected.time: [1.7585 ms 1.7615 ms 1.7657 ms] change: [-0.0040% +0.1601% +0.4043%] (p = 0.14 > 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [91.176 ns 91.543 ns 91.905 ns] change: [-0.6672% -0.1426% +0.3368%] (p = 0.59 > 0.05) coalesce_acked_from_zero 3+1 entries: Change within noise threshold.time: [109.51 ns 109.87 ns 110.24 ns] change: [+0.0489% +0.3224% +0.6528%] (p = 0.04 < 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [108.95 ns 109.36 ns 109.85 ns] change: [-1.1430% -0.3768% +0.2520%] (p = 0.33 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [93.407 ns 93.573 ns 93.760 ns] change: [-0.4809% +0.4096% +1.3240%] (p = 0.38 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [111.16 ms 111.23 ms 111.29 ms] change: [-0.1858% -0.1061% -0.0254%] (p = 0.01 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [5.1737 µs 5.2566 µs 5.3367 µs] change: [-17.793% -5.8337% +2.7663%] (p = 0.49 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [36.784 ms 36.855 ms 36.928 ms] change: [-0.9907% -0.7265% -0.4425%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [36.695 ms 36.762 ms 36.830 ms] change: [-1.1623% -0.9000% -0.6245%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [36.501 ms 36.574 ms 36.645 ms] change: [-1.3617% -1.1059% -0.8641%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [36.843 ms 36.899 ms 36.955 ms] change: [-0.7508% -0.5247% -0.3075%] (p = 0.00 < 0.05) 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.time: [861.13 ms 871.38 ms 882.03 ms] thrpt: [113.38 MiB/s 114.76 MiB/s 116.13 MiB/s] change: time: [-1.7234% -0.1261% +1.5011%] (p = 0.88 > 0.05) thrpt: [-1.4789% +0.1263% +1.7536%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [319.81 ms 323.15 ms 326.44 ms] thrpt: [30.633 Kelem/s 30.946 Kelem/s 31.269 Kelem/s] change: time: [-0.6712% +0.8043% +2.4273%] (p = 0.32 > 0.05) thrpt: [-2.3698% -0.7978% +0.6757%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [25.489 ms 25.654 ms 25.821 ms] thrpt: [38.729 elem/s 38.980 elem/s 39.232 elem/s] change: time: [-0.6669% +0.2354% +1.1855%] (p = 0.61 > 0.05) thrpt: [-1.1716% -0.2348% +0.6714%] 1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.time: [1.8596 s 1.8787 s 1.8981 s] thrpt: [52.684 MiB/s 53.227 MiB/s 53.776 MiB/s] change: time: [-1.1150% +0.2648% +1.6227%] (p = 0.71 > 0.05) thrpt: [-1.5968% -0.2641% +1.1275%] Client/server transfer resultsPerformance differences relative to b6e4cfc. Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
|
@mxinden in case it helps, I have an old Intel MacBook laying around somewhere. |
A datapoint for oven-sh/bun@26526cb#diff-8060260f9888a3a03c4d236f22769baacc4c9aec45abf7f49b5b2bbab7e39579R307 I reached out to the author. |
@larseggert that would be very helpful! Could you checkout the
Which MacOS version is it running? |
System Version: macOS 10.15.7 (19H2026)
And it's hanging there, with the CPU maxed out. |
That is good news. Thanks for reproducing. This matches what we are seeing on Mozilla's CI and above linked commit on oven-sh/bun. |
Another reference to the issue: https://bun.sh/blog/bun-v1.1.22#udp-socket-hangs-on-macos-13 |
So what's happening is that in the receive path, the I wonder if |
On MacOS < 14, with `fast-apple-datapath` feature, calls to `libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero) `libc::cmsghdr` instead of a null pointer. This results in a busy loop in `decode_recv`: ``` rust let cmsg_iter = unsafe { cmsg::Iter::new(hdr) }; for cmsg in cmsg_iter { match (cmsg.cmsg_level, cmsg.cmsg_type) { ``` https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50 This commit fixes the above, returning a `null_mut()` pointer on an empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`. See also mozilla/neqo#2427 for details.
On MacOS < 14, with `fast-apple-datapath` feature, calls to `libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero) `libc::cmsghdr` instead of a null pointer. This results in a busy loop in `decode_recv`: ``` rust let cmsg_iter = unsafe { cmsg::Iter::new(hdr) }; for cmsg in cmsg_iter { match (cmsg.cmsg_level, cmsg.cmsg_type) { ``` https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50 This commit fixes the above, returning a `null_mut()` pointer on an empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`. See also mozilla/neqo#2427 for details.
On MacOS < 14, with `fast-apple-datapath` feature, calls to `libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero) `libc::cmsghdr` instead of a null pointer. This results in a busy loop in `decode_recv`: ``` rust let cmsg_iter = unsafe { cmsg::Iter::new(hdr) }; for cmsg in cmsg_iter { match (cmsg.cmsg_level, cmsg.cmsg_type) { ``` https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50 This commit fixes the above, returning a `null_mut()` pointer on an empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`. See also mozilla/neqo#2427 for details.
Only enable
fast-apple-datapath
on aarch64.Old MacOS versions (e.g. 10.15) don't fully support
sendmsg_x
andrecvmsg_x
, both enabled byfast-apple-datapath
. Instead of feature gating by OS version, simply do so by CPU arch, leveraging the fact that newer MacOS devices use ARM.Draft for now. Debugging mozilla-central failure on MacOS 10.15.