Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issues for vector-based storage #413

Merged

Conversation

AnthonyVH
Copy link
Contributor

  • None of the functions called inside vertex_by_property() required a mutable graph object, but vertex_by_property() itself did. This is now fixed.
  • At several locations in the code base there was the equivalent of container.reserve(container.size() + 1), e.g. add_vertex(). Calling such an add_vertex() in a loop with an std::vector backing the vertex storage results in O(N²) time complexity instead of the expected (amortized) O(N). This is fixed by allowing the container to grow as expected.

@AnthonyVH AnthonyVH force-pushed the anthonyvh/fix_reserve_performance branch 2 times, most recently from 228df65 to 836919b Compare January 15, 2025 23:09
@jeremy-murphy
Copy link
Contributor

jeremy-murphy commented Jan 16, 2025

Thanks for your pull request.

  • None of the functions called inside vertex_by_property() required a mutable graph object, but vertex_by_property() itself did. This is now fixed.

Yup, that's good.

  • At several locations in the code base there was the equivalent of container.reserve(container.size() + 1), e.g. add_vertex(). Calling such an add_vertex() in a loop with an std::vector backing the vertex storage results in O(N²) time complexity instead of the expected (amortized) O(N). This is fixed by allowing the container to grow as expected.

I don't actually see why this is so, unless resize is doing something really pessimistic and reallocating the entire vector every time. Have you benchmarked this to confirm the change in performance?
Regardless, I prefer your implementation to the existing one, although please use auto where you can.
Can you use emplace_back() to create an empty vector instead of push_back(std::vector< cg_vertex >())?

@AnthonyVH
Copy link
Contributor Author

AnthonyVH commented Jan 16, 2025

I don't actually see why this is so, unless resize is doing something really pessimistic and reallocating the entire vector every time.

Everything I'm writing here assumes that add_vertex() is going to be called in a loop. Which I guess is a pretty reasonable assumption, how else is anyone going to construct a graph.

TLDR: Calling resize(prev_size + 1) forced reallocation on every call to add_vertex(), making it O(N). Letting the std::vector decide when to resize itself by calling push_back()/emplace_back() makes that O(1).

Now, for the long and windy explanation. Inside add_vertex() the old code did vertices.resize(prev_size + 1). From std::vector<>::resize():

Resizes the container to contain count elements, does nothing if count == size(). If the current size is greater than count, the container is reduced to its first count elements.

and

Complexity: Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count.

I.e. in case the new size is larger than the old, resize() is going to increase the capacity of the underlying storage to exactly the requested new size. Crucially, it is not going to apply the growth factor it would when an "automatic" capacity increment happens (e.g. if you call push_back() on an std::vector for which size() == capacity()).

Note that it clearly states that the complexity is linear, not O(1). Growing an std::vector by pushing elements into it is amortized O(1). It's exactly because of reallocation that this is amortized. Assume an empty std::vector with capacity X, then the first X pushes are "free". On the X + 1th push_back() the underlying storage is full, so std::vector will call resize() internally, move/copy all existing X elements to new storage, and then push the X + 1th element on. So on average you pay O(X) / X = O(1) per push_back(). When this automatic resize happens the capacity will grow by a certain factor (1.5 for MSVC/libc++, and 2 for libstdc++ AFAIK), which ensures the amortized O(1) complexity for element insertion.

Manually calling resize()/reserve() in a loop is always a bad idea, because it ensures the automatic resizing is side-stepped. Now every one of those calls is going to reallocate all of the existing elements in the std::vector, because the capacity was only large enough to hold prev_size elements. Meaning every add_vector() calls takes O(N) time, whereas normally resize() would only be called every N add_vector() calls, meaning it could be O(1).

Have you benchmarked this to confirm the change in performance?

I did not see existing performance benchmarks, so I didn't try to implement one. However, the behavior of calling resize() is well defined, so for sure the complexity of add_vertex() goes from O(N) to O(1) with my change.

Here is a quick benchmark I threw together illustrating the difference: quick-bench. Note that the difference is only 10% here, most likely because there's not that much data to copy around.

image

I tried creating another benchmark where each entry in the vector is larger (which would be the case when e.g. a vertex has bundled properties that take up a reasonable amount of space). Unfortunately the quick-bench.com website kept timing out on that. I expect the difference to be much more pronounced in this case. If you'd like to try yourself, here's the code in Compiler Explorer (you can hit "Quick bench" in the top left there).

Edit: I realize the benchmarks show only a small speed difference. I guess the compiler somehow sees through the whole thing? I know of instances in industrial code where this exact same change caused a massive speed-up.

@AnthonyVH
Copy link
Contributor Author

Here's another snippet from cppreference.com that describes the issue with the resize() call (which will call reserve() under the hood). From std::vector<>::reserve():

Correctly using reserve() can prevent unnecessary reallocations, but inappropriate uses of reserve() (for instance, calling it before every push_back() call) may actually increase the number of reallocations (by causing the capacity to grow linearly rather than exponentially) and result in increased computational complexity and decreased performance. For example, a function that receives an arbitrary vector by reference and appends elements to it should usually not call reserve() on the vector, since it does not know of the vector's usage characteristics.

When inserting a range, the range version of insert() is generally preferable as it preserves the correct capacity growth behavior, unlike reserve() followed by a series of push_back()s.

@AnthonyVH AnthonyVH force-pushed the anthonyvh/fix_reserve_performance branch from 836919b to ebe6a6e Compare January 16, 2025 10:49
@AnthonyVH AnthonyVH force-pushed the anthonyvh/fix_reserve_performance branch from ebe6a6e to 15b587a Compare January 16, 2025 10:50
@jeremy-murphy
Copy link
Contributor

Ohh, I get it, sorry, I just completely forgot that push_back will amortize the reallocations.

@jeremy-murphy jeremy-murphy merged commit a372a07 into boostorg:develop Jan 22, 2025
22 checks passed
@jeremy-murphy
Copy link
Contributor

Thanks for a great improvement to the efficiency!

I know the code change was very small, but in future please don't combine unrelated changes into one PR.

@AnthonyVH
Copy link
Contributor Author

Thanks a lot for the quick merge! I'll try to better pay attention to the PR hygiene next time 😁.

@AnthonyVH AnthonyVH deleted the anthonyvh/fix_reserve_performance branch January 22, 2025 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants