-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impact of copy_as_padded
#171
Comments
Yes, it will increase memory usage and increase the amount of work for GC. So, for example, padding every node of a linked data structure typically reduces performance as such nodes are typically relatively short lived and/or not (all of them) accessed frequently and so the reduction in false sharing does not outweigh the costs from the increased memory usage.
The root record of a data structure is just as vulnerable to false sharing as anything else. And it is the most persistent and long-lived part of a data structure. People often seem to have wrong intuition and think that the CPU and the caches somehow distinguish between "atomic" and "non-atomic" locations in memory. But, in reality, on a modern cache coherent system, for the CPU and caches there is no such distinction at all. Any cache line that happens to be frequently written to by some core (atomically or non-atomically) and is frequently read by (an)other core(s) (atomically or non-atomically) quickly becomes a bottleneck and this can easily happen with the root node. All it takes is to have some frequently enough mutated thing allocated next to the root. Of course, the problem with false sharing is that sometimes you get lucky. You remove padding and performance is the same. You then conclude that it is not needed. Later you make some seemingly unrelated change to the code and performance drops (and you might not notice this, because your changes are not related and so you are not careful), because now something mutable happens to get allocated or later moved/compacted to be in the same cache line with some hot data structure root.
The way I've done this is that I've used an optional
In many cases that would not really work. You would have to do that benchmarking on the applications. False sharing can be guarded against by a data structure implementation, but if it doesn't guard against that, then an application may just happen to have allocation and access patterns that induce false sharing with the data structure. |
There are a bunch of places in the datastructures which makes use of
Multicore_magic.copy_as_padded
. For example:https://github.com/ocaml-multicore/saturn/blame/fc34768b90a9f5503b19d1605cf115c241f51888/src/ws_deque.ml#L48
It's very hard to evaluate in the abstract which
copy_as_padded
are required (so I've to trust that intensive benchmarks were done) and I sometimes wonder if some are there "just in case". Is it possible that abusingcopy_as_padded
could negatively impact large program performances due to the increase memory usage? (even though the optim looks great in micro-benchmarks? or is the extra space small enough not to be a worry?)Specifically, I'm wondering if the
copy_as_padded
that are done on the "root" record of datastructures (as in the link above) are 100% necessary to enable by default? Savvy users and micro-benchmarks can callcopy_as_padded
themselves in that case? (edit: but a real program that's not hammering the datastructure may not need it)Also, it would be fun to automate the benchmarking / testing of which
copy_as_padded
are critical by removing one at a time and checking for an expected perf regression :)The text was updated successfully, but these errors were encountered: