-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ogrct.cpp to remove shared shared proj context (m_psLastContex… #11846
base: master
Are you sure you want to change the base?
Conversation
…t), which can lead to scenarios where one thread destroys the context while another is still using it Undo OSGeo@94ede75, which causes segfaults when using gdal in a multi-threaded environment such as a thread worker pool. It might be the case that one thread "destroys" the context (m_psLastContext) while another thread is still using it, leading to a segfault. As you can see with the following line: m_psLastContext = OSRGetProjTLSContext();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you!
This trouble is happening in the google3 mono repo with GDAL at head and a 5 year old version of proj - OSGeo/PROJ@cbf2eeb (I know... I'm working on the upgrade, but it's challenging). LLVM within a couple weeks of head; python 3.11; tried both swig 4.2.1 and 4.4 (from head) Stack trace from asan mode:
|
If you say it fixes things, I guess I must trust you, but I feel stupid not understanding what's wrong exactly in the code that is reverted. Someone should withdraw my license to do C++ multi-threaded code... sigh |
|
We have some more investigation to do on our side. I haven't fully figured out what's up yet. |
@samiermerchant @fredzyda @schwehr This ticket initially appeared suspicious to me for following reasons:
|
Hey Even, Sorry for the surprise and thanks for your note. Mostly hoping you might think of something I missed about the threading issue as I feel over my head on this one. I triggered the CI and have been working with them on the issue. I am also working with Owen who just posted to the list. My getting the google3 copy of GDAL to head has triggered issues in a number of teams. And am hoping to get more involvement from Google engineers with GDAL et al. on GitHub. Never hurts to call out unusual events! I will work on making things more obvious / transparent as we go as we work through an enormous amount of tech dept. I need to make sure I check that any new people follow the Google GitHub process and I need to make an internal doc for how best to document PRs. We use perforce and a very different change control UI inside, so it's jarring for folks who are new to GitHub or haven't done it for a long time. I also haven't been good about sharing our local ports of autotest checks and the extra tests we have written. I will try to work my gdal-autotest2 repo back into my workflow. I know the tests are not the same code style and can't directly work in autotest, but being able to compare existing or forward port changes we have could be useful. I felt for a while that the amgdal-autotest2 repo had done its job as best as possible (except Geos is still on tut and I couldn't convince the Kakadu folk to do any unit tests), but times are always changing. |
@rouault: apologies for the surprises. @samiermerchant and I are working together at Google to fix the bug we found, so he pinged me directly as soon as he put this PR up. I am in the Google open source and Googlers organizations on github, so I think I'm properly listed here as affiliated with Google. Samier and I work on various forms of weather forecasting at Google and make extensive use of gdal in our data ingestion pipelines. |
Dec 9th from before Kurt started to make the significant changes we see issues with. Cut down the logs to make it more obvious:
with the fix cl with tsan still get an error
The code: typedef GDALDataset *(*OpenCallback)(GDALOpenInfo *);
OpenCallback pfnOpen = nullptr;
virtual OpenCallback GetOpenCallback()
{
return pfnOpen;
} |
…t), which can lead to scenarios where one thread destroys the context while another is still using it
Undo 94ede75, which causes segfaults when using gdal in a multi-threaded environment such as a thread worker pool.
It might be the case that one thread "destroys" the context (m_psLastContext) while another thread is still using it, leading to a segfault. As you can see with the following line: m_psLastContext = OSRGetProjTLSContext();
What does this PR do?
Update ogrct.cpp to remove shared shared proj context (m_psLastContext), which can lead to scenarios where one thread destroys the context while another is still using it.
What are related issues/pull requests?
Tasklist
Environment
Provide environment details, if relevant: