Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected regression in /vsicurl #9682

Closed
perliedman opened this issue Apr 16, 2024 · 9 comments · Fixed by #9694
Closed

Suspected regression in /vsicurl #9682

perliedman opened this issue Apr 16, 2024 · 9 comments · Fixed by #9694
Assignees

Comments

@perliedman
Copy link

What is the bug?

Using latest GDAL (3.8.5), fetching a COG using Python/ReadAsArray takes about four times longer than the same code using GDAL 3.7.1.

Enabling CPL_CURL_VERBOSE=YES, I can see that 3.8.5 makes a lot more range requests than 3.7.1:

  • 3.8.5: 2796
  • 3.7.1: 114

Steps to reproduce the issue

I fetch a COG using /vsicurl using this simple code:

time python -c "from osgeo import gdal; gdal.UseExceptions(); src=gdal.Open('/vsicurl/https://solvi-marketing-resources.s3.eu-west-1.amazonaws.com/public/ortho.tiff'); src.ReadAsArray()"

On my connection, this takes about 40 seconds.

Running the exact same code using GDAL 3.7.1, this takes about 10 seconds.

It seems something changed between these versions that caused a pretty major regression.

Versions and provenance

GDAL 3.8.5 test was run in a Docker container running Debian GNU/Linux 11 (bullseye)

GDAL 3.7.1 was tested on my Ubuntu 23.10

Additional context

I stumbled upon this in our production environment, where the code of course is a lot more complex, but it seems the same holds even for a simple example like then one above.

@perliedman
Copy link
Author

I have tried to experiment with these environment variables without making any clear difference in timings:

  • GDAL_HTTP_MULTIRANGE=YES
  • GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES
  • GDAL_HTTP_MULTIPLEX=YES
  • GDAL_HTTP_VERSION=2

@rcoup
Copy link
Member

rcoup commented Apr 16, 2024

FWIW, I'm investigating a performance regression upgrading from 3.6.4 to 3.8.5, but I don't have a concrete reproducer or profile yet. This app is serving tiles from VRTs pointing at S3 COGs, and is using vaguely ~40% more cpu & memory with ~50% lower throughput under 3.8 cf 3.6. A few other things have changed too though, so I'm wary to attribute it all to GDAL.

@jratike80
Copy link
Collaborator

I made a test with GDAL 3.8.5 from OSGeo4W. I suppose this is basically the same thing:

gdal_translate /vsicurl/https://solvi-marketing-resources.s3.eu-west-1.amazonaws.com/public/ortho.tiff out.tif --debug on

The command generated 58 range requests.

@rouault
Copy link
Member

rouault commented Apr 16, 2024

I can't reproduce that on my dev environment (Ubuntu 20.04), without any special config option set. I get 114 requests with GDAL master or 3.8.5:

$ CPL_DEBUG=ON python -c "from osgeo import gdal; gdal.UseExceptions(); src=gdal.Open('/vsicurl/http://localhost:8080/ortho.tiff'); src.ReadAsArray()" 2>&1 | grep Downloading | wc -l
114

@perliedman
Copy link
Author

perliedman commented Apr 17, 2024

Thanks for testing and reporting back! You are of course correct that this is not a regression in later GDAL versions. It helped me pinpoint the issue, which is not the GDAL version, but the fact that I had set GDAL_NUM_THREADS=4 in my Docker container and forgot about it.

So, running the code above with GDAL_NUM_THREADS=4 is what cause the increased number of range requests, without it I consistently get 114 range requests in both 3.8.5 as well as 3.7.1. With four threads, the number of requests increase to 2796 and takes about four times longer.

After finding this, I also experimented with some other settings for GDAL_NUM_THREADS, and it seems the number of requests always end up 2796 for any value greater that 1, but the time it takes varies. With two threads, it took 1 minute 20 seconds and with 64 threads I could bring it as low as about 4 seconds.

Is this the intended way for GDAL_NUM_THREADS to work? My naïve expectation was not that I had to use as high values as 16 to get equivalent performance as no threading, and also not that it would increase the amount of requests so enormously.

@rouault
Copy link
Member

rouault commented Apr 18, 2024

and it seems the number of requests always end up 2796 for any value greater that 1

This raster has indeed ceil(7255.0 / 128) * ceil(6237.0 / 128) = 2793 tiles. When you enable multi-threading, the driver decides then to split the reading in as many jobs as they are tiles, and each job is responsible to acquire the data. There's however a logic where jobs declare in advance the data they will need so that the /vsicurl/ layer can try to group together consecutive requests, but that's limited to a maximum of 100 MB simultaneously read, and here your file is bigger than that, hence no effort is made to merge HTTP GET requests. I suspect the GTiff driver implementation could be made smarter to split the RasterIO() request into smaller ones that don't exceed the 100 MB threshold

@rouault rouault self-assigned this Apr 18, 2024
@vincentsarago
Copy link
Contributor

vincentsarago commented Apr 18, 2024

might be related cogeotiff/rio-tiler#697

I'm seeing an increase number of HEAD and GET requests, especially some GetFileSize and GetFileList VSICURL request event when GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR is set

@rouault
Copy link
Member

rouault commented Apr 18, 2024

I suspect the GTiff driver implementation could be made smarter to split the RasterIO() request into smaller ones that don't exceed the 100 MB threshold

improvement implementing that in #9694

@perliedman
Copy link
Author

@rouault thanks a lot for explaining the logic behind this, it makes sense now! For our purposes, I think just not setting GDAL_NUM_THREADS will work well, but of course improvements will be welcome when they arrive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants