Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide way to check whether an upload would fail without uploading #17520

Open
mauvilsa opened this issue Jan 30, 2025 · 4 comments
Open

Provide way to check whether an upload would fail without uploading #17520

mauvilsa opened this issue Jan 30, 2025 · 4 comments

Comments

@mauvilsa
Copy link

What's the problem this feature will solve?
In many CI pipelines python packages are built and uploaded to PyPI after a release and git tag has been created. There are reasons why people would want to do it this way, and PyPI should not impose an alternative. If PyPI rejects the upload, then the just created release is invalid. Also it is common that this happens in a main branch, where it is not good practice to change the history.

My proposal is to provide an official way to check whether an upload would be fails without uploading. With this CI pipelines can validate built packages against the actual PyPI server, before a release is created. Thus, avoiding in most cases these invalid releases.

Because this feature is lacking, there are projects which actually upload the packages. This seems to me like an undesired practice.

Describe the solution you'd like
I propose to extend the API in warehouse to allow validating a package without uploading. I think this should trigger the exact checks that an actual upload would do, requiring the same authentication, rejecting due to any of the reasons why an actual upload would. Surely there are many details I haven't configured and solution would need to be refined. But I think the general idea is there.

Additional context
I created this issue as suggested here pypa/twine#1152 (comment) by @woodruffw. I copy his thoughts here for reference, but looking at that issue can give more context.

You could open an issue on Warehouse to discuss this, but there are a handful of nontrivial dimensions to this: do "dry-run" uploads require the same auth as normal uploads? How does Warehouse notify people when a dry-run fails (not just a generic error code, but a structured, detailed message)? What guarantees do users have that their packages are not retained during the dry-run? And so forth -- I think these are all surmountable, but they need to be considered in sum.

I have seen #17261 related to a server to test uploads. I don't like much the idea about a separate server since there could be a version discrepancy with respect to the real PyPI. Also, if there is an official way to check, then dummy uploads by people can be avoided.

@mauvilsa mauvilsa added feature request requires triaging maintainers need to do initial inspection of issue labels Jan 30, 2025
@mauvilsa mauvilsa changed the title Provide way to check whether an upload would be fails without uploading. Provide way to check whether an upload would fail without uploading Jan 30, 2025
@woodruffw
Copy link
Member

Thanks @mauvilsa!

Some other previous similar requests/discussion points:

In general, I think it'll be difficult for Warehouse to provide a service check for this, although long term it's possible for the client and server to mostly converge on the checks they do by adopting the same validation APIs/libraries. For example twine previously used pkginfo for metadata parsing but now uses packaging, which is the same (official) library Warehouse uses.

@woodruffw woodruffw removed the requires triaging maintainers need to do initial inspection of issue label Jan 30, 2025
@mauvilsa
Copy link
Author

mauvilsa commented Feb 3, 2025

@woodruffw could you please explain why it would be difficult for warehouse to provide a check? Seems to me rather simple, since it would be reusing the existing upload logic. Only implement not persisting the upload.

I had seen the issues related to splitting the checks into a library. But to me that seems like a less optimal solution. Even if in a library, there could be version discrepancies between twine and warehouse. Also other reasons why an upload could fail could never be done locally, for example if the upload would exceed a quota.

Also if warehose were to implement this as an informal standard, it would be useful for people sooner. And maybe be part of the standardization process of the upload endpoint itself, which from what I understood you said, it is not yet done. This way other python package indexes could follow, and the discrepancies in checks between the implementations wouldn't matter.

@woodruffw
Copy link
Member

could you please explain why it would be difficult for warehouse to provide a check? Seems to me rather simple, since it would be reusing the existing upload logic. Only implement not persisting the upload.

I think the original comment in pypa/twine#1152 (comment) covered some of the reasons. A "dry-run" endpoint could probably cover 95% of the checks Warehouse does, but full generality would be difficult because of statefulness: there are global checks that happen on package, index, and connection state.

The most basic example of this is ratelimiting, but quotas are also nontrivial (PyPI could tell you whether a single file would fit within the existing quota, but a dry-run for all files in a release would require additional state/communication between the index and client that doesn't yet exist).

@mauvilsa
Copy link
Author

mauvilsa commented Feb 3, 2025

I didn't mean that a "dry-run" endpoint must cover 100% of the checks. Covering 95% would already be amazing. My point was that having it as an endpoint has some advantages that a local twine check would never be able to achieve. I don't know what checks PyPI or other index implementations do. But just as a hypothetical. Individual users might have size limits for uploaded files. And endpoint "dry-run" could catch this. Or a different example. When setting up a repo or refactoring, someone might misconfigure authentication. With a dry-run endpoint this could be caught early in a ci run. Without this, people find out the first time they release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants