Fix `unwrap` 1.11 regression + performance improvements #576

wheeheee · 2024-10-30T16:35:41Z

The regression affects 1.11 but not 1.10. I think the optimizer fails to simplify merge_groups. Aside, this includes a small performance enhancement (replacing sort! with sortperm) at the cost of ~40% more memory. Maybe it's worth it.

PR

julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 42 samples with 1 evaluation.
 Range (min … max):  104.560 ms … 184.250 ms  ┊ GC (min … max):  0.00% … 38.84%
 Time  (median):     118.173 ms               ┊ GC (median):    10.01%
 Time  (mean ± σ):   120.484 ms ±  16.391 ms  ┊ GC (mean ± σ):  10.78% ±  9.32%

  ▃█▁      █   ▃ ▁
  ███▁▁▁▁▄▁█▇▄▇█▄█▇▄▄▁▄▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▁
  105 ms           Histogram: frequency by time          184 ms <

 Memory estimate: 74.29 MiB, allocs estimate: 250113.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 64 samples with 1 evaluation.
 Range (min … max):  67.356 ms … 108.634 ms  ┊ GC (min … max):  0.00% … 36.22%
 Time  (median):     78.171 ms               ┊ GC (median):    12.69%
 Time  (mean ± σ):   79.053 ms ±   8.890 ms  ┊ GC (mean ± σ):  13.00% ±  8.80%

  ▂▄▂            ██    ▂
  ████▁▁▁▁▁▁▄▁▁▁▄████▄██▆▁▄█▁▁▆▁▄▁▄▁▄▄▆▁▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▄ ▁
  67.4 ms         Histogram: frequency by time          105 ms <

 Memory estimate: 61.42 MiB, allocs estimate: 125114.

Only fix (similar to 1.10)

julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 40 samples with 1 evaluation.
 Range (min … max):  114.546 ms … 157.513 ms  ┊ GC (min … max): 0.00% … 26.19%
 Time  (median):     126.192 ms               ┊ GC (median):    8.14%
 Time  (mean ± σ):   126.530 ms ±  10.672 ms  ┊ GC (mean ± σ):  8.23% ±  7.21%

  █ ▃             ▁  ▁        ▁
  █▇█▁▄▁▄▁▄▁▁▁▁▄▄▄█▁▇█▄▄▄▁▁▁▁▇█▇▁▁▁▁▄▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄ ▁
  115 ms           Histogram: frequency by time          158 ms <

 Memory estimate: 53.35 MiB, allocs estimate: 250104.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 57 samples with 1 evaluation.
 Range (min … max):  78.370 ms … 119.592 ms  ┊ GC (min … max): 0.00% … 32.56%
 Time  (median):     88.701 ms               ┊ GC (median):    9.91%
 Time  (mean ± σ):   87.541 ms ±   8.102 ms  ┊ GC (mean ± σ):  8.33% ±  7.10%

  █▂▂                ▂ ▂▂
  ███▄█▁▆▁▁▁▁▄▁▁▁▁▄█▆█▄██▆▆▆█▆▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▄ ▁
  78.4 ms         Histogram: frequency by time          111 ms <

 Memory estimate: 43.19 MiB, allocs estimate: 125105.

Without PR

julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min … max):  145.698 ms … 224.118 ms  ┊ GC (min … max): 0.00% … 32.65%
 Time  (median):     157.550 ms               ┊ GC (median):    5.90%
 Time  (mean ± σ):   158.949 ms ±  13.967 ms  ┊ GC (mean ± σ):  6.32% ±  6.64%

    █     ▂ ▂  ▅
  ▅████▁▁▅█▅██▅█▅▁█▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  146 ms           Histogram: frequency by time          224 ms <

 Memory estimate: 53.35 MiB, allocs estimate: 250104.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  80.909 ms … 120.537 ms  ┊ GC (min … max):  0.00% … 31.32%
 Time  (median):     91.762 ms               ┊ GC (median):    10.11%
 Time  (mean ± σ):   91.261 ms ±   9.141 ms  ┊ GC (mean ± σ):   8.99% ±  8.14%

    █▄                   ▄
  ▆▇██▄▄▄▁▁▁▁▁▆▁▁▁▇▆▄▇▆▆▆█▆▆▄▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  80.9 ms         Histogram: frequency by time          119 ms <

 Memory estimate: 43.19 MiB, allocs estimate: 125105.```

</details>

codecov · 2024-10-30T16:41:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.81%. Comparing base (8326adf) to head (774aedb).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
- Coverage   97.84%   97.81%   -0.04%     
==========================================
  Files          19       19              
  Lines        3249     3243       -6     
==========================================
- Hits         3179     3172       -7     
- Misses         70       71       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

update docs for rng

shorten `merge_groups!` again

martinholters

Looks good to me, but I have a hard time reviewing this PR where cosmetic changes and performance improvements are mixed together.

martinholters · 2024-11-18T14:27:55Z

src/unwrap.jl

-    sort!(edges, alg=MergeSort)
-    gather_pixels!(pixel_image, edges)
+    perm = sortperm(map(x -> x.reliability, edges); alg=MergeSort)
+    edges = edges[perm]


Why is this better? And why the map? Shouldn't the custom isless take care of that? And if not, wouldn't a by= be better by avoiding a temporary array?

I found it faster in empirical testing. I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.

There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.

I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.

Ah yes, that makes sense. Less data movement by doing sortperm and then indexing.

There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.

Right, but

perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)

isn't exactly more obscure? Or even perm = sortperm(edges; alg=MergeSort), but relying on the isless method might be considered a bit obscure. Anyway, my approval to this PR persists, so feel free to merge if you don't want change this; it's really not important.

Right, but

perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)

isn't exactly more obscure?

Oh sorry, I was referring to something else. Missed the part about by=.
I tried it out just now, and using the by keyword with sortperm is even slower than the original sort!. I also tried changing the isless method to use < (which may not be correct in case of NaNs and Inf) instead of isless, with good results, so at this point I think it might also be an inlining issue, but investigating that probably involves looking at the sorting machinery.

julia> @btime unwrap(A; dims=1:2) setup=A=rand(500,500); 153.364 ms (250102 allocations: 55.25 MiB) # with by kw 93.429 ms (250105 allocations: 59.06 MiB) # PR 94.014 ms (250096 allocations: 41.93 MiB) # sort!, with <

Ok, whatever... Merge as is, then?

wheeheee · 2024-11-19T07:35:49Z

Outside of the scope of this PR, but I have some doubts about calculate_reliability.
For non-border regions, reliability is calculated over a cube around the pixel (3^N pixels, described in the comments as "nearest neighbours"), although there are only 2N Edges involving one pixel in populate_edges!, which are the nearest neighbours in the context of lattices.
But if there are circular dims / faces are connected, then pixel_shifts_border contains duplicate shifts. This makes more sense if only 2N neighbours should have been included in the reliability calculation, since then there would only be 1 shift per border pixel that wraps around.
Also, the reliability of pixels on the edges, including corners, are just rand(rng) and are not calculated.
I don't have access to the cited publications, so I can't check whether the current implementation is actually correct.

martinholters · 2024-11-19T09:55:10Z

I've never looked at the algorithm/its implementation in any detail and don't have the bandwidth to do so in the near future. Wanna open an issue about it so we don't forget?

wheeheee force-pushed the noib_unwrap branch 4 times, most recently from dcf4eeb to 251b257 Compare November 11, 2024 10:38

wheeheee force-pushed the noib_unwrap branch from 251b257 to 270654c Compare November 16, 2024 05:08

wheeheee added 8 commits November 18, 2024 20:35

remove most inbounds, N=2 calc, use Val

3d6c12f

replace internal GLOBAL_RNG with default_rng()

2f4eff8

update docs for rng

cosmetic changes

571c58d

ugly rewrite to fix 1.11 regression

468fda8

shorten `merge_groups!` again

be more Cartesian

3a81254

use sortperm

f54826d

using in test for convenience

12095b7

just pixel

774aedb

wheeheee force-pushed the noib_unwrap branch from 270654c to 774aedb Compare November 18, 2024 12:35

martinholters approved these changes Nov 18, 2024

View reviewed changes

martinholters merged commit 56773aa into JuliaDSP:master Nov 19, 2024
10 checks passed

wheeheee deleted the noib_unwrap branch November 19, 2024 12:25

wheeheee mentioned this pull request Nov 19, 2024

Doubts about correctness of calculate_reliability! implementation in Unwrap #594

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `unwrap` 1.11 regression + performance improvements #576

Fix `unwrap` 1.11 regression + performance improvements #576

wheeheee commented Oct 30, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

martinholters left a comment

martinholters Nov 18, 2024

wheeheee Nov 19, 2024

wheeheee Nov 19, 2024

martinholters Nov 19, 2024

wheeheee Nov 19, 2024

martinholters Nov 19, 2024

wheeheee Nov 19, 2024

wheeheee commented Nov 19, 2024

martinholters commented Nov 19, 2024

Fix unwrap 1.11 regression + performance improvements #576

Fix unwrap 1.11 regression + performance improvements #576

Conversation

wheeheee commented Oct 30, 2024 • edited Loading

codecov bot commented Oct 30, 2024 • edited Loading

Codecov Report

martinholters left a comment

Choose a reason for hiding this comment

martinholters Nov 18, 2024

Choose a reason for hiding this comment

wheeheee Nov 19, 2024

Choose a reason for hiding this comment

wheeheee Nov 19, 2024

Choose a reason for hiding this comment

martinholters Nov 19, 2024

Choose a reason for hiding this comment

wheeheee Nov 19, 2024

Choose a reason for hiding this comment

martinholters Nov 19, 2024

Choose a reason for hiding this comment

wheeheee Nov 19, 2024

Choose a reason for hiding this comment

wheeheee commented Nov 19, 2024

martinholters commented Nov 19, 2024

Fix `unwrap` 1.11 regression + performance improvements #576

Fix `unwrap` 1.11 regression + performance improvements #576

wheeheee commented Oct 30, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading