perf(transport): auto-tune stream receive window #1868

mxinden · 2024-05-02T10:22:24Z

Previously the stream send and receive window had a hard limit at 1MB. On high latency and/or high bandwidth connections (i.e. large bandwidth-delay product), 1 MB is not enough to exhaust the available bandwidth.

Sample scenario:

delay_s = 0.05
window_bits = 1 * 1024 * 1024 * 8
bandwidth_bits_s = window_bits / delay_s
bandwidth_mbits_s = bandwidth_bits_s / 1024 / 1024 # 160.0

In other words, on a 50 ms connection a 1 MB window can at most achieve 160 Mbit/s.

This commit introduces an auto-tuning algorithm for the stream receive window, increasing the window towards the bandwidth-delay product of the connection.

Fixes #733.

This commit adds a basic smoke test using the `test-ficture` simulator, asserting that on a connection with unlimited bandwidth and 50ms round-trip-time Neqo can eventually achieve > 1 Gbit/s throughput. Showcases the potential a future stream flow-control auto-tuning algorithm can have. See mozilla#733.

Previously the stream send and receive window had a hard limit at 1MB. On high latency and/or high bandwidth connections, 1 MB is not enough to exhaust the available bandwidth. Sample scenario: ``` delay_s = 0.05 window_bits = 1 * 1024 * 1024 * 8 bandwidth_bits_s = window_bits / delay_s bandwidth_mbits_s = bandwidth_bits_s / 1024 / 1024 # 160.0 ``` In other words, on a 50 ms connection a 1 MB window can at most achieve 160 Mbit/s. This commit introduces an auto-tuning algorithm for the stream receive window, increasing the window towards the bandwidth-delay product of the connection.

github-actions · 2024-05-02T10:27:31Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest vs. mvfst: DC

All results

QUIC Interop Runner, client vs. server

Succeeded Interop Tests

Unsupported Interop Tests

chrome vs. neqo-latest: H DC

github-actions · 2024-05-07T10:31:51Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to b6e4cfc.

neqo-latest as client

neqo-latest vs. aioquic: Z
neqo-latest vs. go-x-net: BP BA
neqo-latest vs. haproxy: BP BA
neqo-latest vs. kwik: Z 3 🚀~~A C1~~ ⚠️L1 V2 BP BA
neqo-latest vs. lsquic: L1 C1
neqo-latest vs. msquic: ⚠️R Z A L1 🚀L2 C1
neqo-latest vs. mvfst: A L1 C1 BP BA
neqo-latest vs. nginx: BP BA
neqo-latest vs. ngtcp2: CM
neqo-latest vs. picoquic: ⚠️Z A L1 C1
neqo-latest vs. quic-go: A BP BA
neqo-latest vs. quiche: BP BA
neqo-latest vs. s2n-quic: 🚀BP BA CM
neqo-latest vs. tquic: S BP BA
neqo-latest vs. xquic: A

neqo-latest as server

aioquic vs. neqo-latest: run cancelled after 20 min
chrome vs. neqo-latest: ⚠️3
go-x-net vs. neqo-latest: CM
kwik vs. neqo-latest: BP BA CM
lsquic vs. neqo-latest: CM
msquic vs. neqo-latest: Z U CM
mvfst vs. neqo-latest: Z A L1 C1 CM
openssl vs. neqo-latest: LR M 6 CM
quic-go vs. neqo-latest: CM
quiche vs. neqo-latest: 🚀L1 ⚠️C1 CM
quinn vs. neqo-latest: V2 CM
s2n-quic vs. neqo-latest: CM
tquic vs. neqo-latest: CM
xquic vs. neqo-latest: M CM

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R B U ⚠️L1 🚀A L2 🚀C1 C2 6
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA
neqo-latest vs. msquic: H DC LR C20 M S ⚠️R B U 🚀L2 C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. picoquic: H DC LR C20 M S R ⚠️Z 3 B U E L2 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6 🚀BP
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6 BP BA

neqo-latest as server

go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2 BP BA
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2 BP BA
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: 🚀~~H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM~~
openssl vs. neqo-latest: H DC C20 S R 3 B A L2 C2 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 BP BA
quiche vs. neqo-latest: H DC LR M S R Z 3 B A 🚀L1 L2 ⚠️C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. kwik: E CM
neqo-latest vs. lsquic: CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. picoquic: CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

github-actions · 2024-05-07T12:24:45Z

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: Debug Release

This commit adds a basic smoke test using the `test-fixture` simulator, asserting the expected bandwidth on a 1 gbit link. Given mozilla#733, the current expected bandwidth is limited by the fixed sized stream receive buffer (1MiB).

A `Node` (e.g. a `Client`, `Server` or `TailDrop` router) can be in 3 states: ``` rust enum NodeState { /// The node just produced a datagram. It should be activated again as soon as possible. Active, /// The node is waiting. Waiting(Instant), /// The node became idle. Idle, } ``` `NodeHolder::ready()` determines whether a `Node` is ready to be processed again. When `NodeState::Waiting`, it should only be ready when `t <= now`, i.e. the waiting time has passed, not `t >= now`. ``` rust impl NodeHolder { fn ready(&self, now: Instant) -> bool { match self.state { Active => true, Waiting(t) => t <= now, // not >= Idle => false, } } } ``` The previous behavior lead to wastefull non-ready `Node`s being processed and thus a large test runtime when e.g. simulating a gbit connection (mozilla#2203).

mxinden · 2024-12-31T12:32:33Z

neqo-transport/src/fc.rs

+        // Auto-tune max_active, i.e. the flow control window.
+        //
+        // If the sending rate ( window_bytes used / elapsed ) exceeds the rate
+        // allowed by the maximum flow control window and the current rtt (
+        // max_active / rtt ), try to increase the maximum flow control window (
+        // max_active ).
+        if let Some(max_allowed_sent_at) = self.max_allowed_sent_at {
+            let elapsed = now.duration_since(max_allowed_sent_at);
+            let window_bytes_used = self.max_active - (self.max_allowed - self.retired);
+
+            // Same as `elapsed / rtt < window_bytes_used / max_active`
+            // without floating point division.
+            if elapsed.as_micros() * u128::from(self.max_active)
+                < rtt.as_micros() * u128::from(window_bytes_used)
+            {
+                let prev_max_active = self.max_active;
+                // Try doubling the flow control window.
+                //
+                // Note that the flow control window should grow at least as
+                // fast as the congestion control window, in order to not
+                // unnecessarily limit throughput.
+                self.max_active = min(2 * self.max_active, MAX_RECV_WINDOW_SIZE);
+                qdebug!(
+                    "Increasing max stream receive window: previous max_active: {} MiB new max_active: {} MiB last update: {:?} rtt: {rtt:?} stream_id: {}",
+                    prev_max_active / 1024 / 1024, self.max_active / 1024 / 1024,  now-self.max_allowed_sent_at.unwrap(), self.subject,
+                );
+            }
+        }


Note that this is not the exact algorithm suggested by @martinthomson in #733 (comment).

The algorithm proposed in this pull request adopts Martin's trigger mechanism, namely to increase the window based on the perceived BDP.

Therefore, I suggest that if the rate at which self.retired increases (that is, the change in that value, divided by the time elapsed) exceeds some function of self.max_active / path.rtt,

It does not adopt the increase mechanism, i.e. to increase by the amount of retired data. Instead, the window is simply doubled.

then we can increase self.max_active by the amount that self.retired has increased.

The rational is documented above.

// Try doubling the flow control window. // // Note that the flow control window should grow at least as // fast as the congestion control window, in order to not // unnecessarily limit throughput.

The reason I didn't recommend doubling is that our congestion control algorithms will increase the window at a rate greater than double at times. Then we'll lag here if the peer uses one of those. They might be tripling or more when they are at the steep part of a Cubic ramp. We'll be giving them less than that.

The other problem I see with this is that it doubles when the other side is close to matching the BDP. Fluctuations in rate (or RTT estimate) will cause the condition to be tripped (retiring 1/4 of the window in ~1/4 of the RTT is expected). Whereas retiring 1/2 of the data in 1/4 of an RTT is cause to increase much more.

On the other hand, I see how using the newly retired amount -- as I suggested -- doesn't do better than double. Perhaps this:

let bonus_multiplier = 4; // same as number of updates per RTT 🤔 let proportion_of_rtt = (now - last_update) / rtt; let excess = window_bytes_used - (max_active * proportion_of_rtt); if excess > 0 { self.max_active += bonus_multiplier * excess; }

(Obviously, this is sloppy. Better code would use integer math and check for overflow and whatnot.)

For instance, if they blow through the entire window in 1/4 of an RTT, then this would add 4*3/4 of the the window, quadrupling it. There is no upper limit to how fast we can increase here if the peer sends faster (successfully). As our estimate becomes closer to approximately correct for the BDP, we'd correct less. If we're short by 10 bytes, then it would add just 40. As the RTT fluctuates, I expect that this would happen a couple of times before it settles just above the BDP.

For my next trick, I will also suggest that the per-stream limit look at the connection-level estimate (which should track BDP better) so that we start each stream at a decent fraction of that (which could be 1, but I'd suggest 1/4).

mxinden · 2025-01-31T14:46:10Z

@martinthomson can you give this pull request a review?

Not urgent, but I believe important, as this can have a significant impact on our up- and download throughput on high-bandwidth-delay connections.

mxinden · 2025-02-10T10:26:53Z

Friendly ping @martinthomson.

Or maybe @larseggert, do you have some spare cycles to review this pull request?

Signed-off-by: Lars Eggert <[email protected]>

neqo-transport/benches/transfer.rs

martinthomson

Preliminary comments, with updated thoughts on the algorithm.

martinthomson · 2025-02-10T11:03:31Z

neqo-transport/src/fc.rs

    ) {
        if !self.frame_needed() {
            return;
        }
+
+        // Auto-tune max_active, i.e. the flow control window.


This should be in a new function. Then you can document it more cleanly and use early returns to make it neater.

martinthomson · 2025-02-10T11:12:09Z

neqo-transport/src/fc.rs

@@ -363,13 +418,14 @@ impl ReceiverFlowControl<StreamId> {
                max_data: max_allowed,
            }));
            self.frame_sent(max_allowed);
+            self.max_allowed_sent_at = Some(now);


The name of this is a bit strange. It's just the time at which you last sent a frame. Would last_update make more sense?

martinthomson · 2025-02-10T11:15:03Z

neqo-transport/src/fc.rs

@@ -255,6 +273,11 @@ where
        }
    }

+    const fn should_send_flowc_update(&self) -> bool {
+        let window_bytes_unused = self.max_allowed.saturating_sub(self.retired);


Is it ever the case that max_allowed < retired such that you need saturating_sub()?

martinthomson · 2025-02-10T11:24:54Z

neqo-transport/src/fc.rs

+        // Auto-tune max_active, i.e. the flow control window.
+        //
+        // If the sending rate ( window_bytes used / elapsed ) exceeds the rate
+        // allowed by the maximum flow control window and the current rtt (
+        // max_active / rtt ), try to increase the maximum flow control window (
+        // max_active ).
+        if let Some(max_allowed_sent_at) = self.max_allowed_sent_at {
+            let elapsed = now.duration_since(max_allowed_sent_at);
+            let window_bytes_used = self.max_active - (self.max_allowed - self.retired);
+
+            // Same as `elapsed / rtt < window_bytes_used / max_active`
+            // without floating point division.
+            if elapsed.as_micros() * u128::from(self.max_active)
+                < rtt.as_micros() * u128::from(window_bytes_used)
+            {
+                let prev_max_active = self.max_active;
+                // Try doubling the flow control window.
+                //
+                // Note that the flow control window should grow at least as
+                // fast as the congestion control window, in order to not
+                // unnecessarily limit throughput.
+                self.max_active = min(2 * self.max_active, MAX_RECV_WINDOW_SIZE);
+                qdebug!(
+                    "Increasing max stream receive window: previous max_active: {} MiB new max_active: {} MiB last update: {:?} rtt: {rtt:?} stream_id: {}",
+                    prev_max_active / 1024 / 1024, self.max_active / 1024 / 1024,  now-self.max_allowed_sent_at.unwrap(), self.subject,
+                );
+            }
+        }


The reason I didn't recommend doubling is that our congestion control algorithms will increase the window at a rate greater than double at times. Then we'll lag here if the peer uses one of those. They might be tripling or more when they are at the steep part of a Cubic ramp. We'll be giving them less than that.

The other problem I see with this is that it doubles when the other side is close to matching the BDP. Fluctuations in rate (or RTT estimate) will cause the condition to be tripped (retiring 1/4 of the window in ~1/4 of the RTT is expected). Whereas retiring 1/2 of the data in 1/4 of an RTT is cause to increase much more.

On the other hand, I see how using the newly retired amount -- as I suggested -- doesn't do better than double. Perhaps this:

let bonus_multiplier = 4; // same as number of updates per RTT 🤔 let proportion_of_rtt = (now - last_update) / rtt; let excess = window_bytes_used - (max_active * proportion_of_rtt); if excess > 0 { self.max_active += bonus_multiplier * excess; }

(Obviously, this is sloppy. Better code would use integer math and check for overflow and whatnot.)

For instance, if they blow through the entire window in 1/4 of an RTT, then this would add 4*3/4 of the the window, quadrupling it. There is no upper limit to how fast we can increase here if the peer sends faster (successfully). As our estimate becomes closer to approximately correct for the BDP, we'd correct less. If we're short by 10 bytes, then it would add just 40. As the RTT fluctuates, I expect that this would happen a couple of times before it settles just above the BDP.

For my next trick, I will also suggest that the per-stream limit look at the connection-level estimate (which should track BDP better) so that we start each stream at a decent fraction of that (which could be 1, but I'd suggest 1/4).

martinthomson · 2025-02-10T11:25:34Z

neqo-transport/src/fc.rs

@@ -255,6 +273,11 @@ where
        }
    }

+    const fn should_send_flowc_update(&self) -> bool {


Suggested change

const fn should_send_flowc_update(&self) -> bool {

const fn should_send_update(&self) -> bool {

The files is fc.rs and the type is ReceiverFlowControl

martinthomson · 2025-02-10T11:39:34Z

neqo-transport/src/send_stream.rs

@@ -494,10 +494,10 @@ impl TxBuffer {

    /// Attempt to add some or all of the passed-in buffer to the `TxBuffer`.
    pub fn send(&mut self, buf: &[u8]) -> usize {
-        let can_buffer = min(SEND_BUFFER_SIZE - self.buffered(), buf.len());
+        let can_buffer = min(MAX_SEND_BUFFER_SIZE - self.buffered(), buf.len());


I'm OK with that. As you say, it either gets discarded when the short-lived stream goes away, or it is available for the next write on a long-lived stream.

Was there a particular reason you increased this here? Was it necessary to exercise the increasing receive buffer size.

neqo-transport/src/connection/params.rs

larseggert · 2025-02-10T11:50:22Z

neqo-transport/src/fc.rs

+            if elapsed.as_micros() * u128::from(self.max_active)
+                < rtt.as_micros() * u128::from(window_bytes_used)


Is there a way to avoid u128 math? It can be slow on some low-end ARM chips, and if we're executing this in the fast path we might want to avoid.

Given that this happens at most 4 times per RTT, I'm OK with this being a tiny bit expensive. If this were on a hot path, that would be different.

That said, it might be OK to make this an approximate thing, with conversion to a double instead. The imprecision of floating point math shouldn't hurt in this case.

github-actions · 2025-02-10T16:46:03Z

Benchmark results

Performance differences relative to b6e4cfc.

decode 4096 bytes, mask ff: No change in performance detected.

       time:   [10.868 µs 10.908 µs 10.954 µs]
       change: [-0.3752% +0.0964% +0.5582%] (p = 0.70 > 0.05)
Found 18 outliers among 100 measurements (18.00%)

2 (2.00%) low severe

3 (3.00%) low mild

3 (3.00%) high mild

10 (10.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.

       time:   [3.1279 ms 3.1371 ms 3.1476 ms]
       change: [-0.3526% +0.0761% +0.4969%] (p = 0.74 > 0.05)
Found 9 outliers among 100 measurements (9.00%)

9 (9.00%) high severe

decode 4096 bytes, mask 7f: Change within noise threshold.

       time:   [17.606 µs 17.646 µs 17.692 µs]
       change: [-1.3032% -0.6574% -0.1182%] (p = 0.02 < 0.05)
Found 17 outliers among 100 measurements (17.00%)

1 (1.00%) low severe

7 (7.00%) low mild

1 (1.00%) high mild

8 (8.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.

       time:   [5.4070 ms 5.4199 ms 5.4343 ms]
       change: [-0.3472% +0.0150% +0.3539%] (p = 0.93 > 0.05)
Found 17 outliers among 100 measurements (17.00%)

17 (17.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.

       time:   [6.6500 µs 6.6712 µs 6.6953 µs]
       change: [-0.5912% +0.0781% +0.7357%] (p = 0.82 > 0.05)
Found 4 outliers among 100 measurements (4.00%)

1 (1.00%) low mild

2 (2.00%) high mild

1 (1.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.

       time:   [1.7580 ms 1.7595 ms 1.7624 ms]
       change: [-0.0395% +0.0499% +0.2732%] (p = 0.69 > 0.05)
Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) high mild

1 (1.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [91.025 ns 91.322 ns 91.626 ns]
       change: [-0.6505% -0.1252% +0.3992%] (p = 0.65 > 0.05)
Found 12 outliers among 100 measurements (12.00%)

9 (9.00%) high mild

3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.

       time:   [109.36 ns 109.69 ns 110.05 ns]
       change: [+0.1337% +0.6709% +1.5148%] (p = 0.04 < 0.05)
Found 19 outliers among 100 measurements (19.00%)

4 (4.00%) low mild

1 (1.00%) high mild

14 (14.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [108.78 ns 109.04 ns 109.40 ns]
       change: [-1.2619% -0.4920% +0.2018%] (p = 0.20 > 0.05)
Found 11 outliers among 100 measurements (11.00%)

3 (3.00%) low mild

4 (4.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [93.215 ns 93.360 ns 93.528 ns]
       change: [-0.8111% +0.1881% +1.1366%] (p = 0.73 > 0.05)
Found 8 outliers among 100 measurements (8.00%)

4 (4.00%) high mild

4 (4.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.

       time:   [111.33 ms 111.38 ms 111.43 ms]
       change: [-0.0329% +0.0337% +0.1001%] (p = 0.32 > 0.05)
Found 16 outliers among 100 measurements (16.00%)

4 (4.00%) low mild

12 (12.00%) high mild

SentPackets::take_ranges: No change in performance detected.

       time:   [5.1663 µs 5.2420 µs 5.3122 µs]
       change: [-17.833% -5.9009% +2.6758%] (p = 0.49 > 0.05)
Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [38.022 ms 38.085 ms 38.149 ms]
       change: [+2.3237% +2.5856% +2.8408%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [37.941 ms 38.001 ms 38.062 ms]
       change: [+2.2001% +2.4407% +2.6890%] (p = 0.00 < 0.05)

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [37.818 ms 37.879 ms 37.941 ms]
       change: [+2.1669% +2.4217% +2.6583%] (p = 0.00 < 0.05)

transfer/pacing-true/same-seed: Change within noise threshold.

       time:   [38.164 ms 38.213 ms 38.262 ms]
       change: [+2.8103% +3.0164% +3.2258%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.

       time:   [856.82 ms 867.67 ms 878.75 ms]
       thrpt:  [113.80 MiB/s 115.25 MiB/s 116.71 MiB/s]
change:
       time:   [-2.1945% -0.5510% +1.0584%] (p = 0.52 > 0.05)
       thrpt:  [-1.0474% +0.5541% +2.2437%]

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.

       time:   [320.32 ms 323.65 ms 326.93 ms]
       thrpt:  [30.587 Kelem/s 30.898 Kelem/s 31.219 Kelem/s]
change:
       time:   [-0.6179% +0.9609% +2.6066%] (p = 0.24 > 0.05)
       thrpt:  [-2.5403% -0.9518% +0.6217%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [25.372 ms 25.556 ms 25.747 ms]
       thrpt:  [38.840  elem/s 39.130  elem/s 39.414  elem/s]
change:
       time:   [-1.1289% -0.1478% +0.8669%] (p = 0.76 > 0.05)
       thrpt:  [-0.8595% +0.1480% +1.1418%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: Change within noise threshold.

       time:   [1.8272 s 1.8459 s 1.8644 s]
       thrpt:  [53.635 MiB/s 54.175 MiB/s 54.729 MiB/s]
change:
       time:   [-2.7996% -1.4901% -0.1100%] (p = 0.03 < 0.05)
       thrpt:  [+0.1101% +1.5127% +2.8802%]

Client/server transfer results

Performance differences relative to b6e4cfc.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client	Server	CC	Pacing	Mean ± σ	Min	Max	Δ `main`	Δ `main`
neqo	neqo	reno	on	508.9 ± 42.3	455.0	629.5	💚 -35.7	-1.7%
neqo	neqo	reno		513.5 ± 110.0	436.4	1026.8	💚 -42.1	-2.0%
neqo	neqo	cubic	on	527.3 ± 42.6	447.1	667.8	💚 -16.0	-0.7%
neqo	neqo	cubic		498.2 ± 46.7	445.8	687.3	💚 -36.3	-1.8%
google	neqo	reno	on	908.5 ± 95.3	675.0	995.3	💚 6.4	0.2%
google	neqo	reno		904.1 ± 92.5	689.1	1000.3	💚 2.1	0.1%
google	neqo	cubic	on	903.4 ± 94.2	682.3	987.0	💚 8.3	0.2%
google	neqo	cubic		902.1 ± 94.1	681.3	1001.4	💚 7.6	0.2%
google	google			533.1 ± 9.2	517.6	558.1	💚 -1.8	-0.1%
neqo	msquic	reno	on	222.2 ± 21.2	200.3	293.3	💚 -4.0	-0.4%
neqo	msquic	reno		222.3 ± 32.7	201.5	382.3	💚 1.0	0.1%
neqo	msquic	cubic	on	229.7 ± 40.2	201.6	425.2	💚 3.9	0.4%
neqo	msquic	cubic		217.8 ± 11.8	198.5	246.8	💚 -1.7	-0.2%
msquic	msquic			114.0 ± 19.8	98.4	206.3	💚 -4.7	-1.0%

⬇️ Download logs

mxinden added 2 commits April 25, 2024 16:43

mxinden added 3 commits May 7, 2024 11:39

clippy

1bfd2f5

Merge branch 'main' of https://github.com/mozilla/neqo into auto-tuning

167d93f

fix tests

edc4035

mxinden mentioned this pull request May 8, 2024

refactor: enable mozilla-central http3server to use neqo-bin #1878

Merged

mxinden added 17 commits May 14, 2024 15:49

Don't increase on STREAM_DATA_BLOCKED

12390c8

enforce STREAM_MAX_ACTIVE_LIMIT

0ad1b77

add TODO starting below 1MiB

8e6a5af

Add NonRandomDelay

716ba20

Reduce transfer amount and throughput expectation

3b2a52b

Merge branch 'main' of https://github.com/mozilla/neqo into auto-tuning

dba8190

Merge branch 'main' of https://github.com/mozilla/neqo into auto-tuning

9b9524e

debugging

ac5d024

Merge branch 'main' of https://github.com/mozilla/neqo into auto-tuning

22f1d7e

deduplicate in fc.rs

db3cfe5

Google vs Thomson vs BDP

62dc2ba

More testing

841d086

Adjust bench

5ec58cb

Move to benchmark

6dd3829

Add TODO on sent rtt in receive auto-tuning

f9b9a27

larseggert mentioned this pull request Dec 4, 2024

Add stats about how often a connection is blocked because of the flow control limits #790

Open

mxinden added 3 commits December 13, 2024 11:53

update every rtt/4 and don't use floating point division

8aff641

Settle on Thomson algorithm

4a146e6

More aggressive increase

5e7d7d6

mxinden commented Dec 31, 2024

View reviewed changes

mxinden added 13 commits December 31, 2024 14:27

Add max_send_buffer_size test

e808b51

Merge remote-tracking branch 'mozilla/main' into auto-tuning

57c2870

Clippy

ba2c321

Clippy

6de63fb

Plan tests

aba14e9

Add tests

4ee45fc

Fix min_bandwidth

82eb50b

Add quick check style test

fb51122

Interleaving test

1fa5a77

Rational for MAX_RECV_WINDOW_SIZE

6cb8de9

Clippy

f10cf32

Fix test

8419c3f

Fix intra doc links

5724444

mxinden marked this pull request as ready for review January 4, 2025 15:30

mxinden requested review from KershawChang, martinthomson and larseggert as code owners January 4, 2025 15:30

Expose raw bandwidth

4d70b34

mxinden mentioned this pull request Jan 5, 2025

test(transport): assert maximum bandwidth on gbit link #2203

Closed

mxinden added 2 commits January 15, 2025 21:05

Use LINK_BANDWIDTH const

08c2d37

Merge branch 'main' of https://github.com/mozilla/neqo into auto-tuning

68291d8

Merge branch 'main' into auto-tuning

2966b4f

Signed-off-by: Lars Eggert <[email protected]>

larseggert reviewed Feb 10, 2025

View reviewed changes

neqo-transport/benches/transfer.rs Show resolved Hide resolved

martinthomson reviewed Feb 10, 2025

View reviewed changes

larseggert reviewed Feb 10, 2025

View reviewed changes

Fixes to get the bench to run

bc4cb4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(transport): auto-tune stream receive window #1868

perf(transport): auto-tune stream receive window #1868

mxinden commented May 2, 2024 •

edited

Loading

github-actions bot commented May 2, 2024

Succeeded Interop Tests

Unsupported Interop Tests

github-actions bot commented May 7, 2024 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented May 7, 2024 •

edited

Loading

mxinden Dec 31, 2024

martinthomson Feb 10, 2025

mxinden commented Jan 31, 2025

mxinden commented Feb 10, 2025

martinthomson left a comment

martinthomson Feb 10, 2025

martinthomson Feb 10, 2025

martinthomson Feb 10, 2025

martinthomson Feb 10, 2025

martinthomson Feb 10, 2025

martinthomson Feb 10, 2025

larseggert Feb 10, 2025

martinthomson Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025

	const fn should_send_flowc_update(&self) -> bool {
	const fn should_send_update(&self) -> bool {

		if elapsed.as_micros() * u128::from(self.max_active)
		< rtt.as_micros() * u128::from(window_bytes_used)

perf(transport): auto-tune stream receive window #1868

Are you sure you want to change the base?

perf(transport): auto-tune stream receive window #1868

Conversation

mxinden commented May 2, 2024 • edited Loading

github-actions bot commented May 2, 2024

Failed Interop Tests

Succeeded Interop Tests

Unsupported Interop Tests

github-actions bot commented May 7, 2024 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented May 7, 2024 • edited Loading

Firefox builds for this PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Jan 31, 2025

mxinden commented Feb 10, 2025

martinthomson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinthomson Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Feb 10, 2025

Benchmark results

Client/server transfer results

mxinden commented May 2, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited

Loading

github-actions bot commented May 7, 2024 •

edited

Loading

martinthomson Feb 10, 2025 •

edited

Loading