Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unwrap 1.11 regression + performance improvements #576

Merged
merged 8 commits into from
Nov 19, 2024

Conversation

wheeheee
Copy link
Member

@wheeheee wheeheee commented Oct 30, 2024

The regression affects 1.11 but not 1.10. I think the optimizer fails to simplify merge_groups. Aside, this includes a small performance enhancement (replacing sort! with sortperm) at the cost of ~40% more memory. Maybe it's worth it.

PR
julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 42 samples with 1 evaluation.
 Range (min  max):  104.560 ms  184.250 ms  ┊ GC (min  max):  0.00%  38.84%
 Time  (median):     118.173 ms               ┊ GC (median):    10.01%
 Time  (mean ± σ):   120.484 ms ±  16.391 ms  ┊ GC (mean ± σ):  10.78% ±  9.32%

  ▃█▁      █   ▃ ▁
  ███▁▁▁▁▄▁█▇▄▇█▄█▇▄▄▁▄▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▁
  105 ms           Histogram: frequency by time          184 ms <

 Memory estimate: 74.29 MiB, allocs estimate: 250113.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 64 samples with 1 evaluation.
 Range (min  max):  67.356 ms  108.634 ms  ┊ GC (min  max):  0.00%  36.22%
 Time  (median):     78.171 ms               ┊ GC (median):    12.69%
 Time  (mean ± σ):   79.053 ms ±   8.890 ms  ┊ GC (mean ± σ):  13.00% ±  8.80%

  ▂▄▂            ██    ▂
  ████▁▁▁▁▁▁▄▁▁▁▄████▄██▆▁▄█▁▁▆▁▄▁▄▁▄▄▆▁▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▄ ▁
  67.4 ms         Histogram: frequency by time          105 ms <

 Memory estimate: 61.42 MiB, allocs estimate: 125114.
Only fix (similar to 1.10)
julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 40 samples with 1 evaluation.
 Range (min  max):  114.546 ms  157.513 ms  ┊ GC (min  max): 0.00%  26.19%
 Time  (median):     126.192 ms               ┊ GC (median):    8.14%
 Time  (mean ± σ):   126.530 ms ±  10.672 ms  ┊ GC (mean ± σ):  8.23% ±  7.21%

  █ ▃             ▁  ▁        ▁
  █▇█▁▄▁▄▁▄▁▁▁▁▄▄▄█▁▇█▄▄▄▁▁▁▁▇█▇▁▁▁▁▄▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄ ▁
  115 ms           Histogram: frequency by time          158 ms <

 Memory estimate: 53.35 MiB, allocs estimate: 250104.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 57 samples with 1 evaluation.
 Range (min  max):  78.370 ms  119.592 ms  ┊ GC (min  max): 0.00%  32.56%
 Time  (median):     88.701 ms               ┊ GC (median):    9.91%
 Time  (mean ± σ):   87.541 ms ±   8.102 ms  ┊ GC (mean ± σ):  8.33% ±  7.10%

  █▂▂                ▂ ▂▂
  ███▄█▁▆▁▁▁▁▄▁▁▁▁▄█▆█▄██▆▆▆█▆▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▄ ▁
  78.4 ms         Histogram: frequency by time          111 ms <

 Memory estimate: 43.19 MiB, allocs estimate: 125105.
Without PR
julia> @benchmark unwrap(A; dims=1:2) setup=A=rand(500,500)
BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min  max):  145.698 ms  224.118 ms  ┊ GC (min  max): 0.00%  32.65%
 Time  (median):     157.550 ms               ┊ GC (median):    5.90%
 Time  (mean ± σ):   158.949 ms ±  13.967 ms  ┊ GC (mean ± σ):  6.32% ±  6.64%

    █     ▂ ▂  ▅
  ▅████▁▁▅█▅██▅█▅▁█▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  146 ms           Histogram: frequency by time          224 ms <

 Memory estimate: 53.35 MiB, allocs estimate: 250104.

julia> @benchmark unwrap(A; dims=1:3) setup=A=rand(50,50,50)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min  max):  80.909 ms  120.537 ms  ┊ GC (min  max):  0.00%  31.32%
 Time  (median):     91.762 ms               ┊ GC (median):    10.11%
 Time  (mean ± σ):   91.261 ms ±   9.141 ms  ┊ GC (mean ± σ):   8.99% ±  8.14%

    █▄                   ▄
  ▆▇██▄▄▄▁▁▁▁▁▆▁▁▁▇▆▄▇▆▆▆█▆▆▄▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  80.9 ms         Histogram: frequency by time          119 ms <

 Memory estimate: 43.19 MiB, allocs estimate: 125105.```

</details>

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.81%. Comparing base (8326adf) to head (774aedb).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
- Coverage   97.84%   97.81%   -0.04%     
==========================================
  Files          19       19              
  Lines        3249     3243       -6     
==========================================
- Hits         3179     3172       -7     
- Misses         70       71       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@wheeheee wheeheee force-pushed the noib_unwrap branch 4 times, most recently from dcf4eeb to 251b257 Compare November 11, 2024 10:38
Copy link
Member

@martinholters martinholters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I have a hard time reviewing this PR where cosmetic changes and performance improvements are mixed together.

Comment on lines -130 to +131
sort!(edges, alg=MergeSort)
gather_pixels!(pixel_image, edges)
perm = sortperm(map(x -> x.reliability, edges); alg=MergeSort)
edges = edges[perm]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this better? And why the map? Shouldn't the custom isless take care of that? And if not, wouldn't a by= be better by avoiding a temporary array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it faster in empirical testing. I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why, some cache locality issues when sorting? Because Edges are 4 times the size of floats.

Ah yes, that makes sense. Less data movement by doing sortperm and then indexing.

There are also allocations elsewhere that could be reduced, although I feel that would obscure the code.

Right, but

perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)

isn't exactly more obscure? Or even perm = sortperm(edges; alg=MergeSort), but relying on the isless method might be considered a bit obscure. Anyway, my approval to this PR persists, so feel free to merge if you don't want change this; it's really not important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but

perm = sortperm(edges; by=x -> x.reliability, alg=MergeSort)

isn't exactly more obscure?

Oh sorry, I was referring to something else. Missed the part about by=.
I tried it out just now, and using the by keyword with sortperm is even slower than the original sort!. I also tried changing the isless method to use < (which may not be correct in case of NaNs and Inf) instead of isless, with good results, so at this point I think it might also be an inlining issue, but investigating that probably involves looking at the sorting machinery.

julia> @btime unwrap(A; dims=1:2) setup=A=rand(500,500);
  153.364 ms (250102 allocations: 55.25 MiB)  # with by kw
  93.429 ms (250105 allocations: 59.06 MiB)   # PR
  94.014 ms (250096 allocations: 41.93 MiB)   # sort!, with <

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, whatever... Merge as is, then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@wheeheee
Copy link
Member Author

Outside of the scope of this PR, but I have some doubts about calculate_reliability.
For non-border regions, reliability is calculated over a cube around the pixel (3^N pixels, described in the comments as "nearest neighbours"), although there are only 2N Edges involving one pixel in populate_edges!, which are the nearest neighbours in the context of lattices.
But if there are circular dims / faces are connected, then pixel_shifts_border contains duplicate shifts. This makes more sense if only 2N neighbours should have been included in the reliability calculation, since then there would only be 1 shift per border pixel that wraps around.
Also, the reliability of pixels on the edges, including corners, are just rand(rng) and are not calculated.
I don't have access to the cited publications, so I can't check whether the current implementation is actually correct.

@martinholters
Copy link
Member

I've never looked at the algorithm/its implementation in any detail and don't have the bandwidth to do so in the near future. Wanna open an issue about it so we don't forget?

@martinholters martinholters merged commit 56773aa into JuliaDSP:master Nov 19, 2024
10 checks passed
@wheeheee wheeheee deleted the noib_unwrap branch November 19, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants