Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impact of copy_as_padded #171

Open
art-w opened this issue Nov 28, 2024 · 1 comment
Open

Impact of copy_as_padded #171

art-w opened this issue Nov 28, 2024 · 1 comment

Comments

@art-w
Copy link
Contributor

art-w commented Nov 28, 2024

There are a bunch of places in the datastructures which makes use of Multicore_magic.copy_as_padded. For example:

https://github.com/ocaml-multicore/saturn/blame/fc34768b90a9f5503b19d1605cf115c241f51888/src/ws_deque.ml#L48

It's very hard to evaluate in the abstract which copy_as_padded are required (so I've to trust that intensive benchmarks were done) and I sometimes wonder if some are there "just in case". Is it possible that abusing copy_as_padded could negatively impact large program performances due to the increase memory usage? (even though the optim looks great in micro-benchmarks? or is the extra space small enough not to be a worry?)

Specifically, I'm wondering if the copy_as_padded that are done on the "root" record of datastructures (as in the link above) are 100% necessary to enable by default? Savvy users and micro-benchmarks can call copy_as_padded themselves in that case? (edit: but a real program that's not hammering the datastructure may not need it)

Also, it would be fun to automate the benchmarking / testing of which copy_as_padded are critical by removing one at a time and checking for an expected perf regression :)

@polytypic
Copy link
Contributor

polytypic commented Dec 5, 2024

Is it possible that abusing copy_as_padded could negatively impact large program performances due to the increase memory usage?

Yes, it will increase memory usage and increase the amount of work for GC. So, for example, padding every node of a linked data structure typically reduces performance as such nodes are typically relatively short lived and/or not (all of them) accessed frequently and so the reduction in false sharing does not outweigh the costs from the increased memory usage.

Specifically, I'm wondering if the copy_as_padded that are done on the "root" record of datastructures (as in the link above) are 100% necessary to enable by default?

The root record of a data structure is just as vulnerable to false sharing as anything else. And it is the most persistent and long-lived part of a data structure.

People often seem to have wrong intuition and think that the CPU and the caches somehow distinguish between "atomic" and "non-atomic" locations in memory. But, in reality, on a modern cache coherent system, for the CPU and caches there is no such distinction at all. Any cache line that happens to be frequently written to by some core (atomically or non-atomically) and is frequently read by (an)other core(s) (atomically or non-atomically) quickly becomes a bottleneck and this can easily happen with the root node. All it takes is to have some frequently enough mutated thing allocated next to the root.

Of course, the problem with false sharing is that sometimes you get lucky. You remove padding and performance is the same. You then conclude that it is not needed. Later you make some seemingly unrelated change to the code and performance drops (and you might not notice this, because your changes are not related and so you are not careful), because now something mutable happens to get allocated or later moved/compacted to be in the same cache line with some hot data structure root.

Savvy users and micro-benchmarks can call copy_as_padded themselves in that case? (edit: but a real program that's not hammering the datastructure may not need it)

The way I've done this is that I've used an optional ?padded:bool argument and use copy_as ?padded in the implementation. I'm not sure it is the best approach, because in many use cases you then need to pass ~padded:true or you risk having bad luck. It might be better to pad by default and then the punks would have to specify ~feeling_lucky:true.

Also, it would be fun to automate the benchmarking / testing of which copy_as_padded are critical by removing one at a time and checking for an expected perf regression :)

In many cases that would not really work. You would have to do that benchmarking on the applications. False sharing can be guarded against by a data structure implementation, but if it doesn't guard against that, then an application may just happen to have allocation and access patterns that induce false sharing with the data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants