Improve kd tree performance during initialization #18

maarzt · 2024-10-14T14:22:18Z

Mastodon opens sometimes very slowly on artificial datasets. Profiling with Java Visual VM showed that KDTree initialization takes very long. More precisely calls to the method KDTree.kthElement(...) take up 90% of CPU time.

The method kthElement aims to partially sort a list. The implementation is very efficient with a complexity of O(N) on randomized list. But strategy for selecting the pivot element is very inefficient when applied to an already sorted list, such that the complexity increases to O(N²). Or when applied to a list that contains several copies of the same constant value. This PR fixing the implementation to work efficient for these special cases.

Unit tests are added to ensure that the new method works as intended.

maarzt · 2024-10-14T14:35:47Z

This is the artificial benchmark dataset that triggers the problem:
10k_100k_1M.zip

Up to a million spots are organized in a perfect circle (see screenshot). Without this PR Mastodon needs 15min to open the dataset. With the fix in this PR the dataset is opened in < 1 min.

tinevez · 2024-10-14T14:48:56Z

@tpietzsch look at this!

tinevez

Successfully address the performance issue.

tpietzsch · 2024-10-17T13:22:21Z

Yeah, we had the same problem in imglib2 a while ago imglib/imglib2#333 and it was fixed tor imglib2 KDTree in imglib/imglib2#333.

However, I never got around to fixing the imglib2 kthElement implementation (it's not used by the revised KDTree anymore). @maarzt if you have time to fix net.imglib2.util.KthElement in the same way, that would be very much appreciated!

maarzt added 2 commits October 14, 2024 15:39

Add benchmark for KDTree initialization

d4003ad

Speed up KDTree initialization on difficult datasets

1e0d1dc

maarzt requested a review from tinevez October 14, 2024 14:22

tinevez self-assigned this Oct 14, 2024

tinevez approved these changes Oct 17, 2024

View reviewed changes

tinevez merged commit a79b576 into master Oct 17, 2024
1 check passed

tinevez deleted the improve-kd-tree-performance branch October 17, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve kd tree performance during initialization #18

Improve kd tree performance during initialization #18

maarzt commented Oct 14, 2024

maarzt commented Oct 14, 2024

tinevez commented Oct 14, 2024

tinevez left a comment

tpietzsch commented Oct 17, 2024 •

edited

Loading

Improve kd tree performance during initialization #18

Improve kd tree performance during initialization #18

Conversation

maarzt commented Oct 14, 2024

maarzt commented Oct 14, 2024

tinevez commented Oct 14, 2024

tinevez left a comment

Choose a reason for hiding this comment

tpietzsch commented Oct 17, 2024 • edited Loading

tpietzsch commented Oct 17, 2024 •

edited

Loading