Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop XMLRPC API to use PyPI Index API #1898

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
## Big Fixes

- Support reading HTTP proxy URLs from environment variables, and SOCKS proxy URLs from the 'mirror.proxy' config option `PR #1861`
- Drop support for Pypi XMLPRC API and use instead the new Index API to get all packages `PR #1898`

# 6.6.0

Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,6 @@ parts of PyPI that are needed to support package installation. It does not
support more dynamic APIs of PyPI that maybe be used by various clients for
other purposes.

An example of an unsupported API is [PyPI's XML-RPC interface](https://warehouse.readthedocs.io/api-reference/xml-rpc/), which is used when running `pip search`.

### Bandersnatch Mission

The bandersnatch project strives to:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Bandersnatch documentation
bandersnatch is a PyPI mirror client according to `PEP 381`
https://www.python.org/dev/peps/pep-0381/.

Bandersnatch hits the XMLRPC API of pypi.org to get all packages with serial
Bandersnatch hits the Index JSON API of pypi.org to get all packages with serial
or packages since the last run's serial. bandersnatch then uses the JSON API
of PyPI to get shasums and release file paths to download and workout where
to layout the package files on a POSIX file system.
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
aiohttp==3.11.12
aiohttp-socks==0.10.1
aiohttp-xmlrpc==1.5.0
async-timeout==5.0.1
attrs==25.1.0
chardet==5.2.0
Expand Down
1 change: 0 additions & 1 deletion requirements_docs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ packaging==24.2
requests==2.32.3
sphinx==8.2.1
MyST-Parser==4.0.1
xmlrpc2==0.3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O nice catch. I need to go through and see if we can slim down our requirements file.

Or maybe we move to uv to auto manage this stuff. Wonder if PyPA has any policy on using uv yet ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just brutally looked for "rpc" and this looked unused. Does uv check for imports?

sphinx-argparse-cli==1.19.0

git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme
Expand Down
3 changes: 1 addition & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ version = 6.7.0.dev0
install_requires =
aiohttp
aiohttp-socks
aiohttp-xmlrpc
filelock
humanfriendly
importlib_metadata
Expand Down Expand Up @@ -93,5 +92,5 @@ s3 =
[isort]
atomic = true
profile = black
known_third_party = _pytest,aiohttp,aiohttp_socks,aiohttp_xmlrpc,filelock,freezegun,keystoneauth1,mock_config,packaging,pkg_resources,pytest,setuptools,swiftclient
known_third_party = _pytest,aiohttp,aiohttp_socks,filelock,freezegun,keystoneauth1,mock_config,packaging,pkg_resources,pytest,setuptools,swiftclient
known_first_party = bandersnatch,bandersnatch_filter_plugins,bandersnatch_storage_plugins
90 changes: 37 additions & 53 deletions src/bandersnatch/master.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from typing import Any

import aiohttp
from aiohttp_xmlrpc.client import ServerProxy

import bandersnatch
from bandersnatch.config.proxy import get_aiohttp_proxy_kwargs, proxy_address_from_env
Expand All @@ -28,10 +27,6 @@
"""We got a page back from PyPI that doesn't meet our expected serial."""


class XmlRpcError(aiohttp.ClientError):
"""Issue getting package listing from PyPI Repository"""


class Master:
def __init__(
self,
Expand Down Expand Up @@ -141,58 +136,47 @@
fd.write(chunk)

@property
def xmlrpc_url(self) -> str:
return f"{self.url}/pypi"

# TODO: Potentially make USER_AGENT more accessible from aiohttp-xmlrpc
async def _gen_custom_headers(self) -> dict[str, str]:
# Create dummy client so we can copy the USER_AGENT + prepend bandersnatch info
dummy_client = ServerProxy(self.xmlrpc_url, loop=self.loop)
custom_headers = {
"User-Agent": (
f"bandersnatch {bandersnatch.__version__} {dummy_client.USER_AGENT}"
)
def simple_url(self) -> str:
return f"{self.url}/simple/"

# TODO: Potentially make USER_AGENT more accessible from aiohttp
@property
def _custom_headers(self) -> dict[str, str]:
return {
"User-Agent": f"bandersnatch {bandersnatch.__version__}",
# the simple API use headers to return JSON
"Accept": "application/vnd.pypi.simple.v1+json",
}
await dummy_client.close()
return custom_headers

async def _gen_xmlrpc_client(self) -> ServerProxy:
custom_headers = await self._gen_custom_headers()
client = ServerProxy(
self.xmlrpc_url,
client=self.session,
loop=self.loop,
headers=custom_headers,
)
return client

# TODO: Add an async context manager to aiohttp-xmlrpc to replace this function
async def rpc(self, method_name: str, serial: int = 0) -> Any:
try:
client = await self._gen_xmlrpc_client()
method = getattr(client, method_name)
if serial:
return await method(serial)
return await method()
except TimeoutError as te:
logger.error(f"Call to {method_name} @ {self.xmlrpc_url} timed out: {te}")

async def all_packages(self) -> Any:
all_packages_with_serial = await self.rpc("list_packages_with_serial")
if not all_packages_with_serial:
raise XmlRpcError("Unable to get full list of packages")
return all_packages_with_serial
async def fetch_simple_index(self) -> Any:
"""Return a mapping of all project data from the PyPI Index API"""
logger.debug(f"Fetching simple JSON index from {self.simple_url}")
async with self.session.get(

Check warning on line 154 in src/bandersnatch/master.py

View check run for this annotation

Codecov / codecov/patch

src/bandersnatch/master.py#L153-L154

Added lines #L153 - L154 were not covered by tests
self.simple_url, headers=self._custom_headers
) as response:
simple_index = await response.json()
return simple_index

Check warning on line 158 in src/bandersnatch/master.py

View check run for this annotation

Codecov / codecov/patch

src/bandersnatch/master.py#L157-L158

Added lines #L157 - L158 were not covered by tests

async def all_packages(self) -> dict[str, int]:
"""Return a mapping of all project names as {name: last_serial}"""
simple_index = await self.fetch_simple_index()
if not simple_index:
return {}
all_packages = {
project["name"]: project["_last-serial"]
for project in simple_index["projects"]
}
logger.debug(f"Fetched #{len(all_packages)} from simple JSON index")
return all_packages

async def changed_packages(self, last_serial: int) -> dict[str, int]:
changelog = await self.rpc("changelog_since_serial", last_serial)
if changelog is None:
changelog = []

packages: dict[str, int] = {}
for package, _version, _time, _action, serial in changelog:
if serial > packages.get(package, 0):
packages[package] = serial
return packages
"""Return a mapping of all project names changed since last serial as {name: last_serial}"""
all_packages = await self.all_packages()
changed_packages = {
pkg: ser for pkg, ser in all_packages.items() if ser > last_serial
}
logger.debug(f"Fetched #{len(changed_packages)} changed packages")
return changed_packages

async def get_package_metadata(self, package_name: str, serial: int = 0) -> Any:
try:
Expand Down
1 change: 0 additions & 1 deletion src/bandersnatch/mirror.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,6 @@ async def process_package(self, package: Package) -> None:
await loop.run_in_executor(
self.storage_backend.executor, self.sync_simple_pages, package
)
# XMLRPC PyPI Endpoint stores raw_name so we need to provide it
await loop.run_in_executor(
self.storage_backend.executor,
self.record_finished_package,
Expand Down
1 change: 0 additions & 1 deletion src/bandersnatch/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,6 @@ def session_side_effect(*args: Any, **kwargs: Any) -> Any:
return FakeAiohttpClient()

master = Master("https://pypi.example.com")
master.rpc = mock.Mock() # type: ignore
master.session = mock.MagicMock()
master.session.get.side_effect = session_side_effect
master.session.request.side_effect = session_side_effect
Expand Down
53 changes: 26 additions & 27 deletions src/bandersnatch/tests/test_master.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pytest

import bandersnatch
from bandersnatch.master import Master, StalePage, XmlRpcError
from bandersnatch.master import Master, StalePage


@pytest.mark.asyncio
Expand All @@ -16,45 +16,44 @@ async def test_disallow_http() -> None:


@pytest.mark.asyncio
async def test_rpc_url(master: Master) -> None:
assert master.xmlrpc_url == "https://pypi.example.com/pypi"
async def test_self_simple_url(master: Master) -> None:
assert master.simple_url == "https://pypi.example.com/simple/"


@pytest.mark.asyncio
async def test_all_packages(master: Master) -> None:
expected = [["aiohttp", "", "", "", "69"]]
master.rpc = AsyncMock(return_value=expected) # type: ignore
simple_index = {
"meta": {"_last-serial": 22, "api-version": "1.1"},
"projects": [
{"_last-serial": 20, "name": "foobar"},
{"_last-serial": 18, "name": "baz"},
],
}

master.fetch_simple_index = AsyncMock(return_value=simple_index) # type: ignore
packages = await master.all_packages()
assert expected == packages


@pytest.mark.asyncio
async def test_all_packages_raises(master: Master) -> None:
master.rpc = AsyncMock(return_value=[]) # type: ignore
with pytest.raises(XmlRpcError):
await master.all_packages()
assert packages == {"foobar": 20, "baz": 18}


@pytest.mark.asyncio
async def test_changed_packages_no_changes(master: Master) -> None:
master.rpc = AsyncMock(return_value=None) # type: ignore
master.fetch_simple_index = AsyncMock(return_value=None) # type: ignore
changes = await master.changed_packages(4)
assert changes == {}


@pytest.mark.asyncio
async def test_changed_packages_with_changes(master: Master) -> None:
list_of_package_changes = [
("foobar", "1", 0, "added", 17),
("baz", "2", 1, "updated", 18),
("foobar", "1", 0, "changed", 20),
# The server usually just hands out monotonous serials in the
# changelog. This verifies that we don't fail even with garbage input.
("foobar", "1", 0, "changed", 19),
]
master.rpc = AsyncMock(return_value=list_of_package_changes) # type: ignore
simple_index = {
"meta": {"_last-serial": 22, "api-version": "1.1"},
"projects": [
{"_last-serial": 20, "name": "foobar"},
{"_last-serial": 18, "name": "baz"},
],
}
master.fetch_simple_index = AsyncMock(return_value=simple_index) # type: ignore
changes = await master.changed_packages(4)
assert changes == {"baz": 18, "foobar": 20}
assert changes == {"foobar": 20, "baz": 18}


@pytest.mark.asyncio
Expand All @@ -79,9 +78,9 @@ async def test_master_url_fetch(master: Master) -> None:


@pytest.mark.asyncio
async def test_xmlrpc_user_agent(master: Master) -> None:
client = await master._gen_xmlrpc_client()
assert f"bandersnatch {bandersnatch.__version__}" in client.headers["User-Agent"]
async def test_simple_index_user_agent(master: Master) -> None:
headers = master._custom_headers
assert f"bandersnatch {bandersnatch.__version__}" in headers["User-Agent"]


@pytest.mark.asyncio
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python3

"""
Quick tool to test xmlrpc queries from bandersnatch
Quick tool to test PyPI Index API queries from bandersnatch
"""

import asyncio
Expand All @@ -12,7 +12,7 @@
async def main() -> int:
async with Master("https://pypi.org") as master:
all_packages = await master.all_packages()
print(f"PyPI returned {len(all_packages)} PyPI packages via xmlrpc")
print(f"PyPI returned {len(all_packages)} PyPI packages via Index API")
return 0


Expand Down