-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ChannelSend trait to prep for lossy sender #4797
base: master
Are you sure you want to change the base?
Conversation
The topic of bounded channels have been brought up multiple times before. We prefer not to use bounded channels because they do not distinguish between good and bad packets, and drop (or back pressure) packets indiscriminately. Gossip (or any other protocol) packets are not equal in terms of importance for the protocol or how much resources they need to process. Bounded channels are oblivious to that. If we decide to drop packets we should also take into account stake associated with sender or receiver or CrdsValues. Again something bounded channels ignore. Dropping packets indiscriminately also makes it easier for a spammer to censor good packets. The preferred alternative is doing load shedding at a stage in the pipeline where we have better context how to prioritize packets and which ones to drop if overwhelmed. This is all assuming the right thing to do here is to drop some packets. But I do not have enough context on what problem you are trying to solve, and I can't rule out the culprit or the remedy is something else. |
Thanks for raising these concerns. Allow me to clarify the implicated code paths for more context. The gossip service creates three unbounded channels upon construction (
As far as I can tell, there is no drop prioritization of packets in the first two channels other than newer packets being preferred. The third channel is only partially bounded from dropping, see detailsThe third channel is implicitly partially bounded, as its sender is partially populated via the receiver of the the second channel, which drops packets according to the above scheme ( In summary, I believe we should be able to introduce bounded channels with "lossy" senders in the first two channels without losing any existing behavior. Let me know if there is an existing prioritization mechanism to this that I am missing. |
I did not claim current implementation cannot be improved. We might need to do something better than the current naive load-shedding and maybe prioritize packets by some metrics first (e.g. type of packets, associated stakes, who the sender or receiver is, etc). But a bounded channel which indiscriminately drops (or back pressure) packets does not help with that. That being said, before changing load-shedding, we need to first understand what the actual problem is here.
|
Today, there are spikes in traffic. All incoming traffic is put into an unbounded channel. The receivers on the channel throw away the oldest information beyond a threshold. I created this PR to allow agave to easily switch out the channel implementation. The follow on pr will bound the worst case behavior while providing the exact same load shedding that exists in agave today. If someone (such as the networking team) would like to explore alternate load shedding implementations and root cause the spikes, this seems like a valuable project whose priority would need to be considered relative to other projects. The result of this pr (and the following one) would allow an unbounded channel to be replaced by a bounded channel with the exact behavior as we currently have… except we eliminate the worst case behavior of wastefully growing an unbounded channel to hold data that will be thrown away past a threshold at receive time. If subsequent changes are made to include more sophisticated load shedding, then the fact that implementation is behind a trait bound will likely allow that implementation to plug into the existing code paths. Or, if the more sophisticated version is so vastly different enough that the trait bound isn't sufficient, we can revert / rework relevant code paths at that time -- this is work that would have to be done in either case. This code change and the follow on PR are minimal enough that very little code needs to be changed in either case. |
I appreciate you contributing here and we definitely can benefit from more people helping with debugging and maintaining gossip or scrutinizing the design and implementation. I also fully acknowledge that these packet spikes are something which needs to be investigated and addressed. So please do not consider my comments as dismissive of your work or intentions. However, my point is, before implementing any solutions, we need to take some prerequisite steps:
So, again, I would appreciate it if we do not jump to any solution before we better understand the dynamics here. If you are interested to help with that I would suggest to chat with @gregcusack first. I am sure he has some good insights how to best approach debugging and investigating gossip traffic. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the proposed changes have merit. let's reframe the problem and clarify priorities
the problem being addressed here is not in fact the traffic spikes, it is the memory pressure being imposed on the rest of the system when these spikes occur
we no doubt need to be addressing the root cause of these events, but i'll contend that that is not the mandate of the perf team. the network team has the expertise to resolve that problem most effectively
the perf team on the other hand is tasked with identifying and mitigating bottlenecks with minimal changes to logic and i think that's what they're doing here. replacing one lossy channel implementation with another, while retaining similar loss characteristics is about as close to zero logical changes as we can expect here. however, we trade the former's memory foot print elasticity for the latter's static allocation, isolating these gossip events from the rest of the validator. that is a win in my book. any decoupling, especially when it is implicit, is a win in my book
introducing a trait as an intermediary seems like a fine idea. sure there's only one implementation today, so it's a bit superfluous, but it does make the change set dead simple to review. i expect the followup swapping out the implementation will be just as reviewable. if the network team determines that a smarter channel implementation is needed in the future, there's little to no harm in the trait being there and it might even simplify prototyping
i'm r+ network team approval here. it is ultimately their code to maintain. i do believe that this is the right change to make today for the sake of the rest of agave
--
one criticism i will give is that we are a not a merge update house. @cpubot, please rebase that ugly commit away 😉
This PR was done with the presumption that the channel is growing unboundedly. I have explained on a call that neither the metrics nor the profiles "necessarily" indicate that the channel is growing too large. I have also tested this presumption on a "staked" testnet node, which has shown that right before we start reading from the channel: Also once we start reading from the channel we completely drain the channel and leave it again "empty". Meaning that vast majority of the time we start with an empty channel, we have to wait until something is pushed to the channel and "always" we leave the channel in an "empty" state. Crossbeam implementation of unbounded channel also does not hold on to allocations: There are all sort of other issues, some of them pointed in the earlier comments, but lets leave it at this for the moment: we have not yet seen the evidence that the channel is growing unboundedly or causing any memory issues. Quite the opposite, most of the time the channel is empty, and hardly ever contains more than 10 entries. So the starting presumption here is uncorroborated by the evidence. |
Agree somewhat with Behzad, we need to figure out what is causing the memory pressure:
I think @cpubot is already working on running a mainnet node and checking to see if the unbounded channel specifically is actually causing significant memory pressure. As discussed today, let's wait to get this info before making a decision. Side note/thinking out loud here. There seems to be a difference in how we're each defining "growing unboundedly" that is based on the timescale we are observing the channel
|
this is the problem though. bursty allocation patterns like this are degenerate with jemalloc. that's how removing the allocation elasticity with a bounded channel intends to improve the situation. that is, there is really no problem with gossip service in isolation during these events, but gossip is coupled to the rest of the validator via the allocator, which is being thrown into a degenerate state
if we don't trust the profiler, what other evidence can we provide? our metrics aren't going to capture issues that manifest in the allocator so long as drop counters are moved to the sending side, i'm failing to see how the proposed changes are going to have a material impact on the gossip service's functional behavior. we'll still see the spikes because the metrics are still there to count drops. so networking team can investigate their root cause without issue and perf team can move on to the next bottleneck. can someone enumerate and elaborate the risks this change introduces. apologies if this was already done elsewhere and not documented here |
@t-nelson Please run a validator and inspect the channel size for yourself. A patch introducing a bad design would only make things worse and harder to maintain. I would also appreciate it if you can actually spend some time debugging any of the gossip real issues. At the very least that would provide you with some more insight and understanding of how the pipeline works, and perhaps some of my points above would make more sense to you then. For the record below is the distribution of the channel size right before we start reading from it. This is based on a random sample of 6337 logs over ~18 hours from a staked node on testnet. A channel which is mostly empty is what we are wasting so much time on instead of the actual problems.
|
Problem
Gossip service consistently sees bursts in the millions of packets over the 2 second stat submission window.
The service currently uses
crossbeam_channel::unbounded
channels to buffer packets, which as the name implies, will unboundedly allocate space for incoming packets. The channel receivers will ultimately drop packets outside the bounds ofMAX_GOSSIP_TRAFFIC
.Summary of Changes
This PR introduces a trait,
ChannelSend
, which abstracts over a send side implementation of a channel. This lays the foundation for a follow up PR that will introduce a notion of a "lossy" channel sender. This implementation will wrap acrossbeam_channel::bounded
channel with a sender that will drop messages as channel capacity is reached rather than continuing to buffer messages until the receiver side is drained.This is introduced behind a trait bound due to the genericity of the
streamer
crate and the multi-purpose use of itsreceiver
implementation. In particular, this will allow gossip to use the aforementioned bounded / lossy sender without requiring other components like repair and fetch stage to use this forthcoming implementation.