Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated XML-RPC Pypi APIs #1897

Open
pombredanne opened this issue Mar 5, 2025 · 12 comments
Open

Replace deprecated XML-RPC Pypi APIs #1897

pombredanne opened this issue Mar 5, 2025 · 12 comments
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request needs_external_pr Will rely on non maintainer PR in order to close

Comments

@pombredanne
Copy link
Contributor

See https://mail.python.org/archives/list/[email protected]/thread/5VOX33ARFQUYKIMKM5NS7PM7Z6ZNCSJY/ and:

The following PyPI XMLRPC methods are being permanently deprecated:

list_packages
package_releases
release_urls
release_data

Technically https://warehouse.pypa.io/api-reference/xml-rpc.html#mirroring-support did not deprecated the list_packages_with_serial RPC, but in practice it times out and should likely be replaced with the new simple JSON API that provides a list of packages with serials
See also https://docs.pypi.org/api/index-api/

@ewdurbin I assume that list_packages_with_serial is effectively deprecated even if not documented yet, correct?

@ewdurbin
Copy link
Member

ewdurbin commented Mar 5, 2025

No, that endpoint is still supported and the documentation is up to date.

The intention is to support the three methods changelog_last_serial, changelog_since_serial, and list_packages_with_serial until a better solution for mirroring API is developed, deployed, and adopted.

In this instance the JSON simple API is a great candidate for adoption to replace list_packages_with_serial, except for the caching aspect. XMLRPC list_packages_with_serial will give you an honest un-cached answer every time (which is why it takes 10-15 seconds). The JSON simple API endpoint will remain cached for 24 hours, which is why it is often nearly immediate.

Our metrics for that endpoint show that has been generally stable, with responses ranging from 20-30 seconds from the backends, and generally well below the 60 second internal timeout. Is there a timeout being set by bandersnatch's client libraries?

@ewdurbin
Copy link
Member

ewdurbin commented Mar 5, 2025

Looking into this, I did find a bit of a glaring performance issue in the list_packages_with_serial method (and others!).

Once deployed, that endpoint should get a bit faster. Turns out internally we were executing the DB query and dictionary build for response... twice 🙃

@cooperlees
Copy link
Contributor

O how I'd love to remove all XMLRPC from bandersnatch. I've tied. Long long ago. PR's welcome using other "approved" and preferable PEP'd APIs to get the same data if its:

  • At last as efficient (client side - but up to add a little complexity/compute client side)
  • Decently tested

Thanks @ewdurbin for finding some inefficiencies in the server tho! I feel mirrors don't really need "realtime" responses and that we could probably cache these responses for 1 minute on to the CDN if we wanted, but don't know how friendly the service is to that. And also there probably isn't enough mirrors to really save a huge amount of load ...

@cooperlees cooperlees added enhancement New feature or request needs_external_pr Will rely on non maintainer PR in order to close dependencies Pull requests that update a dependency file labels Mar 5, 2025
@ewdurbin
Copy link
Member

ewdurbin commented Mar 5, 2025

Thanks for flagging regardless. We have shipped a few optimizations that should at least make the projects with serial endpoint live to fight another day:

Image

pombredanne added a commit to pombredanne/bandersnatch that referenced this issue Mar 5, 2025
The XMLRPC is now obsolete and deprecated and even the parts that may
not be deprecated no longer work. Insteas, let's use the simple Index
API using JSON to collect all packages and their changes.

Reference: pypa#1897
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit to pombredanne/bandersnatch that referenced this issue Mar 5, 2025
The XMLRPC is now obsolete and deprecated and even the parts that may
not be deprecated no longer work. Insteas, let's use the simple Index
API using JSON to collect all packages and their changes.

Reference: pypa#1897
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit to pombredanne/bandersnatch that referenced this issue Mar 5, 2025
The XMLRPC is now obsolete and deprecated and even the parts that may
not be deprecated no longer work. Instead, let's use the simple Index
API with JSON to collect all packages and their changes.

Reference: pypa#1897
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit to pombredanne/bandersnatch that referenced this issue Mar 5, 2025
The XMLRPC is now obsolete and deprecated and even the parts that may
not be deprecated no longer work. Instead, let's use the simple Index
API with JSON to collect all packages and their changes.

Reference: pypa#1897
Philippe Ombredanne <[email protected]>
pombredanne added a commit to pombredanne/bandersnatch that referenced this issue Mar 6, 2025
Add typing where missing.
Remove extra underscore from method.

Reference: pypa#1897
Philippe Ombredanne <[email protected]>

Co-authored-by: Cooper Lees <[email protected]>
@pombredanne
Copy link
Contributor Author

pombredanne commented Mar 6, 2025

I pushed a PR dropping support for XMLRPC but the API works again with @ewdurbin patches.

We could:

  • keep XML-RPC alive but mark it somehow deprecated and available only behind an extra and an option, and not used by default?
  • or just drop XML-RPC entirely as proposed?

The only concern is that the once-a-day update of the JSON simple index may be an issue for some use cases. One day of wait feels like a long time for new releases.

@pombredanne
Copy link
Contributor Author

Our metrics for that endpoint show that has been generally stable, with responses ranging from 20-30 seconds from the backends, and generally well below the 60 second internal timeout. Is there a timeout being set by bandersnatch's client libraries?

@ewdurbin the default conf is timeout = 10 indeed

@pombredanne
Copy link
Contributor Author

@ewdurbin
Copy link
Member

ewdurbin commented Mar 6, 2025

@ewdurbin the default conf is timeout = 10 indeed

I would suggest overriding that timeout for the list_packages_with_serials call. PyPI's infrastructure has an effective timeout of 60s, and since I assume this call is rather infrequent it is probably wise to bump it at least to 20-30s.

@pombredanne
Copy link
Contributor Author

Here is my suggestion:

  • reinstate XML-RPC support but not as a default. Add note wrt. timeout for XML-RPC suggesting to bump it to 30 secs.
  • keep the proposed simple API as default

@ewdurbin
Copy link
Member

ewdurbin commented Mar 6, 2025

Honestly, supporting the deprecation of XMLRPC by optimizing /simple/ JSON variant is quite appealing. I'm going to discuss with PyPI admins to see if we can support lowering the amount of caching necessary for /simple/ a bit.

What maximum cached duration would feel acceptable? An hour? 30 minutes? 15 minutes?

@pombredanne
Copy link
Contributor Author

@ewdurbin an hour should be plenty enough for all the uses cases I can fathom.
15 minutes would be awesome.
The new code is simple enough and we are not even caching on the client side (which is IMHO not needed since the listing is a single, one time use for a run)
We have now three methods:

All of these are fast enough and plenty good enough. We do not get all the details that the rpc changelog_since_serial was using, but there are few code that I can using this anyway:
See https://github.com/search?q="changelog_since_serial"&type=code

Just looking at the changes since serial and computing if needed if these demand special treatment (add, yank, etc.) is plenty good enough IMHO

@cooperlees
Copy link
Contributor

supporting the deprecation of XMLRPC by optimizing /simple/ JSON variant is quite appealing

Awesome. I feel this would be awesome to optimize the (even if JSON only) for the simple API for the mirroring use case. I agree with 1h being reasonable and 15 mins being awesome / more than enough for most use cases. We can just document this. I can just see people "releasing to pypi" and wanting it in their internal mirrors within the hour ....

I also love the appeal that mirrors will be able to mirror from mirrors with this setup too ... That's a big scale win potentially and we've had issues requesting it.

Again, not that I think there is much we can do here, happy to make our client cooperate with any decision PyPI makes as best as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request needs_external_pr Will rely on non maintainer PR in order to close
Projects
None yet
Development

No branches or pull requests

3 participants