-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unwrap
1.11 regression + performance improvements
#576
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #576 +/- ##
==========================================
- Coverage 97.84% 97.81% -0.04%
==========================================
Files 19 19
Lines 3249 3243 -6
==========================================
- Hits 3179 3172 -7
- Misses 70 71 +1 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
dcf4eeb
to
251b257
Compare
251b257
to
270654c
Compare
update docs for rng
shorten `merge_groups!` again
270654c
to
774aedb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I have a hard time reviewing this PR where cosmetic changes and performance improvements are mixed together.
sort!(edges, alg=MergeSort) | ||
gather_pixels!(pixel_image, edges) | ||
perm = sortperm(map(x -> x.reliability, edges); alg=MergeSort) | ||
edges = edges[perm] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this better? And why the map
? Shouldn't the custom isless
take care of that? And if not, wouldn't a by=
be better by avoiding a temporary array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it faster in empirical testing. I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.
Ah yes, that makes sense. Less data movement by doing sortperm
and then indexing.
There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.
Right, but
perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)
isn't exactly more obscure? Or even perm = sortperm(edges; alg=MergeSort)
, but relying on the isless
method might be considered a bit obscure. Anyway, my approval to this PR persists, so feel free to merge if you don't want change this; it's really not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but
perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)isn't exactly more obscure?
Oh sorry, I was referring to something else. Missed the part about by=
.
I tried it out just now, and using the by
keyword with sortperm
is even slower than the original sort!
. I also tried changing the isless
method to use <
(which may not be correct in case of NaN
s and Inf
) instead of isless
, with good results, so at this point I think it might also be an inlining issue, but investigating that probably involves looking at the sorting machinery.
julia> @btime unwrap(A; dims=1:2) setup=A=rand(500,500);
153.364 ms (250102 allocations: 55.25 MiB) # with by kw
93.429 ms (250105 allocations: 59.06 MiB) # PR
94.014 ms (250096 allocations: 41.93 MiB) # sort!, with <
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, whatever... Merge as is, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Outside of the scope of this PR, but I have some doubts about |
I've never looked at the algorithm/its implementation in any detail and don't have the bandwidth to do so in the near future. Wanna open an issue about it so we don't forget? |
The regression affects 1.11 but not 1.10. I think the optimizer fails to simplify
merge_groups
. Aside, this includes a small performance enhancement (replacingsort!
withsortperm
) at the cost of ~40% more memory. Maybe it's worth it.PR
Only fix (similar to 1.10)
Without PR