Impact of `copy_as_padded` #171

art-w · 2024-11-28T15:55:35Z

There are a bunch of places in the datastructures which makes use of Multicore_magic.copy_as_padded. For example:

https://github.com/ocaml-multicore/saturn/blame/fc34768b90a9f5503b19d1605cf115c241f51888/src/ws_deque.ml#L48

It's very hard to evaluate in the abstract which copy_as_padded are required (so I've to trust that intensive benchmarks were done) and I sometimes wonder if some are there "just in case". Is it possible that abusing copy_as_padded could negatively impact large program performances due to the increase memory usage? (even though the optim looks great in micro-benchmarks? or is the extra space small enough not to be a worry?)

Specifically, I'm wondering if the copy_as_padded that are done on the "root" record of datastructures (as in the link above) are 100% necessary to enable by default? Savvy users and micro-benchmarks can call copy_as_padded themselves in that case? (edit: but a real program that's not hammering the datastructure may not need it)

Also, it would be fun to automate the benchmarking / testing of which copy_as_padded are critical by removing one at a time and checking for an expected perf regression :)

The text was updated successfully, but these errors were encountered:

polytypic · 2024-12-05T07:55:33Z

Is it possible that abusing copy_as_padded could negatively impact large program performances due to the increase memory usage?

Yes, it will increase memory usage and increase the amount of work for GC. So, for example, padding every node of a linked data structure typically reduces performance as such nodes are typically relatively short lived and/or not (all of them) accessed frequently and so the reduction in false sharing does not outweigh the costs from the increased memory usage.

Specifically, I'm wondering if the copy_as_padded that are done on the "root" record of datastructures (as in the link above) are 100% necessary to enable by default?

The root record of a data structure is just as vulnerable to false sharing as anything else. And it is the most persistent and long-lived part of a data structure.

People often seem to have wrong intuition and think that the CPU and the caches somehow distinguish between "atomic" and "non-atomic" locations in memory. But, in reality, on a modern cache coherent system, for the CPU and caches there is no such distinction at all. Any cache line that happens to be frequently written to by some core (atomically or non-atomically) and is frequently read by (an)other core(s) (atomically or non-atomically) quickly becomes a bottleneck and this can easily happen with the root node. All it takes is to have some frequently enough mutated thing allocated next to the root.

Of course, the problem with false sharing is that sometimes you get lucky. You remove padding and performance is the same. You then conclude that it is not needed. Later you make some seemingly unrelated change to the code and performance drops (and you might not notice this, because your changes are not related and so you are not careful), because now something mutable happens to get allocated or later moved/compacted to be in the same cache line with some hot data structure root.

Savvy users and micro-benchmarks can call copy_as_padded themselves in that case? (edit: but a real program that's not hammering the datastructure may not need it)

The way I've done this is that I've used an optional ?padded:bool argument and use copy_as ?padded in the implementation. I'm not sure it is the best approach, because in many use cases you then need to pass ~padded:true or you risk having bad luck. It might be better to pad by default and then the punks would have to specify ~feeling_lucky:true.

Also, it would be fun to automate the benchmarking / testing of which copy_as_padded are critical by removing one at a time and checking for an expected perf regression :)

In many cases that would not really work. You would have to do that benchmarking on the applications. False sharing can be guarded against by a data structure implementation, but if it doesn't guard against that, then an application may just happen to have allocation and access patterns that induce false sharing with the data structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impact of `copy_as_padded` #171

Impact of `copy_as_padded` #171

art-w commented Nov 28, 2024 •

edited

Loading

polytypic commented Dec 5, 2024 •

edited

Loading

Impact of copy_as_padded #171

Impact of copy_as_padded #171

Comments

art-w commented Nov 28, 2024 • edited Loading

polytypic commented Dec 5, 2024 • edited Loading

Impact of `copy_as_padded` #171

Impact of `copy_as_padded` #171

art-w commented Nov 28, 2024 •

edited

Loading

polytypic commented Dec 5, 2024 •

edited

Loading