Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kd tree performance during initialization #18

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

maarzt
Copy link
Contributor

@maarzt maarzt commented Oct 14, 2024

Mastodon opens sometimes very slowly on artificial datasets. Profiling with Java Visual VM showed that KDTree initialization takes very long. More precisely calls to the method KDTree.kthElement(...) take up 90% of CPU time.

The method kthElement aims to partially sort a list. The implementation is very efficient with a complexity of O(N) on randomized list. But strategy for selecting the pivot element is very inefficient when applied to an already sorted list, such that the complexity increases to O(N²). Or when applied to a list that contains several copies of the same constant value. This PR fixing the implementation to work efficient for these special cases.

Unit tests are added to ensure that the new method works as intended.

@maarzt maarzt requested a review from tinevez October 14, 2024 14:22
@maarzt
Copy link
Contributor Author

maarzt commented Oct 14, 2024

This is the artificial benchmark dataset that triggers the problem:
10k_100k_1M.zip

Up to a million spots are organized in a perfect circle (see screenshot). Without this PR Mastodon needs 15min to open the dataset. With the fix in this PR the dataset is opened in < 1 min.
image

@tinevez
Copy link
Contributor

tinevez commented Oct 14, 2024

@tpietzsch look at this!

@tinevez tinevez self-assigned this Oct 14, 2024
Copy link
Contributor

@tinevez tinevez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully address the performance issue.

@tinevez tinevez merged commit a79b576 into master Oct 17, 2024
1 check passed
@tinevez tinevez deleted the improve-kd-tree-performance branch October 17, 2024 13:08
@tpietzsch
Copy link
Contributor

tpietzsch commented Oct 17, 2024

Yeah, we had the same problem in imglib2 a while ago imglib/imglib2#333 and it was fixed tor imglib2 KDTree in imglib/imglib2#333.

However, I never got around to fixing the imglib2 kthElement implementation (it's not used by the revised KDTree anymore). @maarzt if you have time to fix net.imglib2.util.KthElement in the same way, that would be very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants