Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy/multiple: instanceCopyCopy honor UpdateCompressionAlgorithms #2047

Merged
merged 5 commits into from
Jul 21, 2023

Conversation

flouthoc
Copy link
Contributor

@flouthoc flouthoc commented Jul 18, 2023

Needed By: containers/skopeo#2047

  • Users can set compression formats while performing copy in such cases
    instanceCopyCopy must honor UpdateCompressionAlgorithms.

  • copy/single: set requiredCompression if compressionFormat is changed

  • oci_index: init Annotations if nil and required
  • If UpdateCompressionAlgorithms is set then Annotations map must be
    initialized if its not set.

Integration tests for e2e feature added here: containers/skopeo#2047 (comment)

@flouthoc
Copy link
Contributor Author

flouthoc commented Jul 18, 2023

@mtrmac PTAL once merged I will vendor this into containers/skopeo#2047

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a simple skopeo copy --format oci of an existing image: In that case we should preserve the original blobs (or reuse blobs at destination, if possible), and add the OCI annotations.

So while copy --dest-compress-format zstd is the most salient use case, I think this should work by collecting compression algorithms from all copies.

  • IIRC the BlobInfo returned from copyLayer always contains an algorithm, or Unknown/Uncompressed.
  • Then, copyLayers can deduplicate the values into a set; and return that value.
  • copySingleImage can return the relevant algorithms (optionally add a copySingleImageResult struct so that we don’t have that many unnamed return values)
  • and that can then be used for ListEdit

copy/single.go Outdated Show resolved Hide resolved
internal/manifest/oci_index.go Outdated Show resolved Hide resolved
@mtrmac
Copy link
Collaborator

mtrmac commented Jul 18, 2023

Consider a simple skopeo copy --format oci of an existing image: In that case we should preserve the original blobs (or reuse blobs at destination, if possible), and add the OCI annotations.

(If the source is not already OCI, it is probably v2s1 or v2s2, which only support gzip, so that seems like a non-issue. But the copy can reuse zstd blobs at the destination, creating an OCI image that is mixed gzip + Zstd, or possibly Zstd-only.)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the shortcut in OptimizeDestinationImageAlreadyExists must not be taken if require…Match.

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 19, 2023

Ugh, the OptimizeDestinationImageAlreadyExists path really is a pain… When we do a full copy, we do get algorithm data from all blobs as a side-effect. When we skip that, all of that is missing and needs to be obtained elsewhere. Options I can see:

  1. Teach internal/image.SourcedImage (and manifest implementations) to return a list of algorithms from a manifest. That’s quite possible (we have compressionMIMETypeSet tables already), probably cleanest, but it’s another new API addition, and the data we would get that way comes from a different source than from ordinary copies.
  2. Take the list.Instance().ReadOnly.CompressionAlgorithms data, which was recently computed, pass it from the outside down to copySingleImage, and have copySingleImage return that to the caller on the Optimize… path. Technically that would work, except that the value would not be available in the non-multi-image callers of copySingleImage, i.e. that would introduce an inconsistency in behavior.
  3. Take the conceptually-misplaced heuristic determining CompressionAlgorithm at the start of copyLayer; extract it to a separate helper function; and call that in compareImageDestinationManifestEqual to build the data. That way the algorithm values we return is determined exactly the same way on the Optimize… path and on the other one, and always available.

I think 3. is the cleanest way to do this right now; and if we ever replace the “conceptually-misplaced heuristic” (which serves as a nice example that technical debt never goes away on its own) with a proper manifest API addition, that would naturally handle both code paths.

@flouthoc flouthoc force-pushed the instancecopycopy-compression branch from 6988b8e to 6571477 Compare July 19, 2023 09:57
@flouthoc
Copy link
Contributor Author

@mtrmac I have not updated tests yet but this is what I am thinking. WDYT ?

@flouthoc flouthoc requested a review from mtrmac July 19, 2023 10:07
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick pass, not a careful review — so far all the primary design points look correct.

copy/multiple.go Outdated Show resolved Hide resolved
copy/single.go Outdated
}

func removeDuplicateCompression(algos []compressiontypes.Algorithm) []compressiontypes.Algorithm {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be built on internal/set.Set instead of manually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and probably inlined in the two loops that collect the data.

Copy link
Contributor Author

@flouthoc flouthoc Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set.Add( ensures that no duplicates are there ( I have added a test to verify that ) however compression.Algorithm is not a comparable which is required by Set hence I have created a Set using string i.e compression.Algorithm.Name() and its working fine.

Later I'm using AlgorithmsByNames to retrieve a slice of Algorithms.

copy/single.go Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full review now — ~nothing new.

copy/single.go Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the instancecopycopy-compression branch 5 times, most recently from f9f8cc5 to 0b7e0b5 Compare July 20, 2023 05:47
@flouthoc flouthoc requested a review from mtrmac July 20, 2023 05:51
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Last comments on a debug log and a misplaced comment.

internal/set/set_test.go Outdated Show resolved Hide resolved
copy/single.go Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Show resolved Hide resolved
pkg/compression/compression.go Outdated Show resolved Hide resolved
flouthoc added 3 commits July 21, 2023 07:10
If `UpdateCompressionAlgorithms` is set then `Annotations` map must be
initialized if its not set.

Signed-off-by: Aditya R <[email protected]>
Modifies `copy/single` to correctly use the feature from
containers#2023, that is if compression is
changed blob must be resued only if it matches required compression.

Signed-off-by: Aditya R <[email protected]>
imagecopier implementes feature for requireCompressionFormatMatch
refactor `copySingleImage` to accept that as an argument.

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the instancecopycopy-compression branch from 0b7e0b5 to a40cb55 Compare July 21, 2023 01:40
@flouthoc flouthoc requested a review from mtrmac July 21, 2023 01:41
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise, feel free to merge without another round.

copy/single.go Show resolved Hide resolved
flouthoc added 2 commits July 21, 2023 08:10
Users can set compression formats while performing `copy` in such cases
`instanceCopyCopy` must honor UpdateCompressionAlgorithms.

Signed-off-by: Aditya R <[email protected]>
Verifies if no duplicates are kept when `.Values()` is used to return
slice.

Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the instancecopycopy-compression branch from a40cb55 to bb006b3 Compare July 21, 2023 02:44
@flouthoc flouthoc requested a review from mtrmac July 21, 2023 02:47
@flouthoc
Copy link
Contributor Author

I don't have merge access, @containers/image-maintainers PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants