Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching between free-threaded and standard ABI does not invalidate build cache #4882

Open
ngoldbaum opened this issue Jan 29, 2025 · 5 comments
Labels

Comments

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Jan 29, 2025

Bug Description

See this bcrypt PR, which fixed a caching issue seen on github actions that led to a segfault. IMO this caching issue should have just led to a rebuild rather than a silently broken build.

Steps to Reproduce

I've only been able to trigger this on bcrypt's CI, but now that I understand what's happening I suspect that if you build PyO3 (or a project that depends on PyO3) with a gil-enabled interpreter and then try to rebuild the project with a free-threaded interpreter but keeping the build cache, you will get a successful build but the python extension will be broken.

Your operating system and version

MacOS

Your Python version (python --version)

3.13.1

Your Rust version (rustc --version)

1.84.0

Your PyO3 version

0.23.4

How did you install python? Did you use a virtualenv?

github actions setup-python action

Additional Info

No response

@ngoldbaum ngoldbaum added the bug label Jan 29, 2025
@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Jan 29, 2025

Managed to trigger this locally. Here are steps to reproduce the bug. This assumes you have a clone of the bcrypt repo and that you have 3.13.1 and 3.13.1t installed via pyenv and nox is installed into both interpreter's global environments.

pyenv global 3.13.1
cd bcrypt
nox -fb virtualenv -s tests
rm -r .nox
pyenv global 3.13.1t
nox -fb virtualenv -s tests

The second invocation uses a free-threaded interpreter, but it re-uses the target directory from the GIL-enabled build produced by the previous nox run

You should see a segmentation fault inside free-threaded Python trying to import bcrypt. This doesn't work because the bcrypt module was compiled with the GIL-enabled ABI.

collecting ... Fatal Python error: Segmentation fault

Current thread 0x00000001f6dfc240 (most recent call first):
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1320 in create_module
  File "<frozen importlib._bootstrap>", line 813 in module_from_spec
  File "<frozen importlib._bootstrap>", line 921 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "/Users/goldbaum/Documents/bcrypt/.nox/tests/lib/python3.13t/site-packages/bcrypt/__init__.py", line 13 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1026 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "/Users/goldbaum/Documents/bcrypt/tests/test_bcrypt.py", line 6 in <module>
  File "/Users/goldbaum/Documents/bcrypt/.nox/tests/lib/python3.13t/site-packages/_pytest/assertion/rewrite.py", line 184 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1387 in _gcd_import
  File "/Users/goldbaum/.pyenv/versions/3.13.1t/lib/python3.13t/importlib/__init__.py", line 88 in import_module

@ngoldbaum
Copy link
Contributor Author

I tried and failed to fix this today following a suggestion from @davidhewitt: try adding cargo:rerun-if-changed for the path to the python executable. That way if python in a virtualenv gets replaced, the build gets triggered again.

For some reason I don't understand, that doesn't work. See the fix-python-build-cache branch on my fork. I tried adding it to the build.rs for both pyo3-build-config and pyo3-ffi, but for whatever reason cargo seems to think both crates are fresh when I try to compile after modifying the symlink in the virtualenv.

One odd thing is that if I use touch to modify the mtime of the python executable it does work!

Take a look at this copy/paste from a terminal session where I demonstrate this behavior: https://gist.github.com/ngoldbaum/10ec27e807e6b1668890e31afaa8da77

I had to hack the noxfile.py for bcrypt like this to be able to see the debug output from pip:

diff --git a/noxfile.py b/noxfile.py
index 4b1cd65..f83204a 100644
--- a/noxfile.py
+++ b/noxfile.py
@@ -7,7 +7,7 @@ nox.options.default_venv_backend = "uv|virtualenv"
 @nox.session
 def tests(session: nox.Session) -> None:
     session.install("coverage")
-    session.install(".[tests]")
+    session.install(".[tests]", "-v")

     session.run(
         "coverage", "run", "-m", "pytest", "--strict-markers", *session.posargs

Maybe this is a cargo caching bug? Deleting the file and recreating it the file with different content somehow doesn't trigger the rerun-if-changed check. It looks like checking for dirty builds based on hashes is in cargo but not yet stabilized, so we can't use that as an escape hatch.

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Feb 3, 2025

I can trigger the same behavior in a test Rust project, so I think this is a cargo bug or a shortcoming of the rerun-if-changed feature. I filed an issue: rust-lang/cargo#15134

@ngoldbaum
Copy link
Contributor Author

Until the upstream issue is fixed I think I have at least a way to generate a nicer error message.

If a host config file is found, instead of just blindly trusting it, we could call get_host_interpreter again and make sure the interpreter matches what's in the config file. If it doesn't, print an error suggesting setting e.g. PYO3_ENVIRONMENT_SIGNATURE and refuse to continue.

We could restrict the check for things we know definitely won't work, like switching from a GIL-disabled to GIL-enabled build or vice verse.

@ngoldbaum
Copy link
Contributor Author

Argh, no, that doesn't work because the build script would need to run a second time, which of course it can't unless cargo tells it to.

Maybe this is just something we need to document for now as a possible issue people might run into?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant