Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error using dask-image ndmeasure.label #391

Open
maxbeegee opened this issue Nov 22, 2024 · 1 comment
Open

Memory error using dask-image ndmeasure.label #391

maxbeegee opened this issue Nov 22, 2024 · 1 comment

Comments

@maxbeegee
Copy link

I am obtaining an OOM error when using ndmeasure.label with a large array.

Minimal Complete Verifiable Example:

nx = 5120 # things are OK < 2500
arr = da.random.random(size=(nx,nx,nx))
darr_bin = arr > 0.8
# The next line will fail
label_image, num_labels = dask_image.ndmeasure.label(darr_bin)

Note that this problem occurs already in the last line, and not when executing the computation via, e.g., num_labels.compute().
This also means that I have the same problem when using a (large) cluster as the OOM always occurs on node 1.

Environment:
[I could reproduce this problem on several machines, below is one particular environment]

  • Dask version: 2024.9.1
  • Python version: Python 3.12.6
  • Operating System: Mac OS 12.2
  • Install method (conda, pip, source): conda / mamba
@m-albert
Copy link
Collaborator

Hey @maxbeegee,

thanks a lot for reporting this and sorry for the late reply here.

I could reproduce the issue. Essentially dask_image.ndmeasure.label applies scipy.ndimage.label to the individual chunks of the input image. It then fuses the obtained labels after obtaining a list of equivalent labels from examining the boundaries of the chunks. The current implementation doesn't work too well with very large number of boundaries to examine.

Until we improve the implementation I'd suggest you to try increasing the chunksize of your input array, e.g.:

nx = 5120 # things are OK < 2500
arr = da.random.random(size=(nx,nx,nx), chunksize=(800, 800, 800))
darr_bin = arr > 0.8
# The next line will fail
label_image, num_labels = dask_image.ndmeasure.label(darr_bin)

The configuration above works well for me. For generic dask arrays as input sources you could apply arr.rechunk to change the chunksizes of existing arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants