-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suspected regression in /vsicurl
#9682
Comments
I have tried to experiment with these environment variables without making any clear difference in timings:
|
FWIW, I'm investigating a performance regression upgrading from 3.6.4 to 3.8.5, but I don't have a concrete reproducer or profile yet. This app is serving tiles from VRTs pointing at S3 COGs, and is using vaguely ~40% more cpu & memory with ~50% lower throughput under 3.8 cf 3.6. A few other things have changed too though, so I'm wary to attribute it all to GDAL. |
I made a test with GDAL 3.8.5 from OSGeo4W. I suppose this is basically the same thing:
The command generated 58 range requests. |
I can't reproduce that on my dev environment (Ubuntu 20.04), without any special config option set. I get 114 requests with GDAL master or 3.8.5:
|
Thanks for testing and reporting back! You are of course correct that this is not a regression in later GDAL versions. It helped me pinpoint the issue, which is not the GDAL version, but the fact that I had set So, running the code above with After finding this, I also experimented with some other settings for Is this the intended way for |
This raster has indeed ceil(7255.0 / 128) * ceil(6237.0 / 128) = 2793 tiles. When you enable multi-threading, the driver decides then to split the reading in as many jobs as they are tiles, and each job is responsible to acquire the data. There's however a logic where jobs declare in advance the data they will need so that the /vsicurl/ layer can try to group together consecutive requests, but that's limited to a maximum of 100 MB simultaneously read, and here your file is bigger than that, hence no effort is made to merge HTTP GET requests. I suspect the GTiff driver implementation could be made smarter to split the RasterIO() request into smaller ones that don't exceed the 100 MB threshold |
might be related cogeotiff/rio-tiler#697 I'm seeing an increase number of HEAD and GET requests, especially some |
…mit to reduce the number of I/O requests Fixes OSGeo#9682
improvement implementing that in #9694 |
@rouault thanks a lot for explaining the logic behind this, it makes sense now! For our purposes, I think just not setting |
What is the bug?
Using latest GDAL (3.8.5), fetching a COG using Python/
ReadAsArray
takes about four times longer than the same code using GDAL 3.7.1.Enabling
CPL_CURL_VERBOSE=YES
, I can see that 3.8.5 makes a lot more range requests than 3.7.1:Steps to reproduce the issue
I fetch a COG using
/vsicurl
using this simple code:On my connection, this takes about 40 seconds.
Running the exact same code using GDAL 3.7.1, this takes about 10 seconds.
It seems something changed between these versions that caused a pretty major regression.
Versions and provenance
GDAL 3.8.5 test was run in a Docker container running Debian GNU/Linux 11 (bullseye)
GDAL 3.7.1 was tested on my Ubuntu 23.10
Additional context
I stumbled upon this in our production environment, where the code of course is a lot more complex, but it seems the same holds even for a simple example like then one above.
The text was updated successfully, but these errors were encountered: