-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NodeSession TCP performance #319
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #319 +/- ##
==========================================
+ Coverage 81.58% 81.61% +0.03%
==========================================
Files 61 61
Lines 12304 12299 -5
==========================================
Hits 10038 10038
+ Misses 2266 2261 -5 ☔ View full report in Codecov by Sentry. |
d716e0e
to
9f46597
Compare
Hmm, pg_group integration test failed but other integration tests passed. I think there's a race condition - now that we finish session setup much faster, when the tests run the PG membership hasn't been propagated. If I insert a sleep between session setup and the test then it passes locally. |
I'm concerned about the need to wait for the sync to occur. It "should" occur immediately after the tcp session is authenticated by the We should be spawning the remote actors, then synchronizing the PG groups. In "theory" it's possible we need to reverse these two statements, so we notify the node server that the session is ready only after the rest of the post-auth steps are done. But that'd be very surprising. link. Maybe let's try reversing those prior to doing this sleep thing, it really shouldn't be the case, and is concerning if it is. |
My understanding is that in If we want to guarantee the PG groups are synced after session setup, we probably need to include it in the auth state machine. |
In other words, I don't feel swapping the two lines can help. I tried anyway, and the test still failed. |
Yes OK I see your point now. It might be work adding a control message stating that initial synchronization has completed, and both sides wait to mark the node online until it's completed. In its present form it's complicated to guarantee that a node is "fully" online without these kind of manual polling techniques/etc |
I thought of two options.
I can try to make another PR for (2), what do you think? |
3f252a8
to
dee0567
Compare
Mind rebasing and we can see what this looks like on the new changes for the sync state? Thanks! |
Each of NodeSession's message is encoded with a 2 bytes length prefix followed by the actual object data. Originally when writing the message to TcpStream we write them separately, however the TcpStream is not buffered, so the first 2 bytes write may be pushed to the underlying network stack as a tiny packet, then the data is pushed as one or more packets. The first tiny packet might trigger the pathological interaction between Nagle's Algorithm and Delayed Acknowledgement in TCP, causing from 40ms to 200ms delay of the actual payload delivery. This can be observed from the average 40ms latency between two nodes on the same host conneting via the loopback interface. Combine the length prefix and payload data to one write_all call to avoid this latency anomaly. Since the message is already length prefixed, also skip the length delimiter encoding in proto. Signed-off-by: Kan-Ru Chen <[email protected]>
dee0567
to
5c9e9d3
Compare
It works :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Each of NodeSession's message is encoded with a 2 bytes length prefix followed by the actual object data. Originally when writing the message to TcpStream we write them separately, however the TcpStream is not buffered, so the first 2 bytes write may be pushed to the underlying network stack as a tiny packet, then the data is pushed as one or more packets.
The first tiny packet might trigger the pathological interaction between Nagle's Algorithm and Delayed Acknowledgement in TCP, causing from 40ms to 200ms delay of the actual payload delivery. This can be observed from the average 40ms latency between two nodes on the same host conneting via the loopback interface.
Combine the length prefix and payload data to one write_all call to avoid this latency anomaly. Since the message is already length prefixed, also skip the length delimiter encoding in proto.
Note - remove the double length prefix is a breaking change. If it's not desired, we can keep the original length prefix in proto by using
encode_length_delimited
. That would just add a little extra overhead.