Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syncing total count when using signed_only=True #1609

Merged
merged 1 commit into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/1608.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed syncing progress report, `Parsing CollectionVersion Metadata`, total count when using `signed_only=True`.
11 changes: 8 additions & 3 deletions pulp_ansible/app/tasks/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,17 +585,23 @@ async def _add_collection_version(self, api_version, collection_version_url, met
)
cv_unique = attrgetter("namespace", "name", "version")(collection_version)
fullname, version = f"{cv_unique[0]}.{cv_unique[1]}", cv_unique[2]
if fullname in self.exclude_info and Version(version) in self.exclude_info[fullname]:
return
if cv_unique in self.already_synced:
return

# Mark the collection version as being processed
self.already_synced.add(cv_unique)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Apart from eating a lot of memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for the dependency sync. We don't want to re-process the same collection version that we have already synced. There is only like 50,000 collection versions so this set grows up to a few megabytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's for depsolving?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to skip this when doing a full mirror sync anyway?
What if we do a bloom filter thing?

(Just a few thoughts...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added an if check for adding a CV to the already-synced set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was asking on a tangent. Would you mind making it a separate change?

await self.parsing_metadata_progress_bar.aincrement()

if fullname in self.exclude_info and Version(version) in self.exclude_info[fullname]:
log.debug(_("{}-{} is in excludes list, skipping").format(fullname, version))
return

info = metadata["metadata"]
signatures = metadata.get("signatures", [])
marks = metadata.get("marks", [])

if self.signed_only and not signatures:
log.debug(_("{}-{} does not have any signatures, skipping").format(fullname, version))
return

if self.add_dependents:
Expand Down Expand Up @@ -635,7 +641,6 @@ async def _add_collection_version(self, api_version, collection_version_url, met
d_artifacts=[d_artifact],
extra_data=extra_data,
)
await self.parsing_metadata_progress_bar.aincrement()
await self.put(d_content)

if signatures or marks:
Expand Down