Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bats tests - parallelize #5552

Merged
merged 5 commits into from
Jan 27, 2025
Merged

Conversation

edsantiago
Copy link
Member

All bats tests run with custom root/runroot, so it should be
possible to parallelize them.

(As of this initial commit, tests fail on my laptop, and I expect them to fail here. I just want to get a sense for how things go.)

Signed-off-by: Ed Santiago [email protected]

None

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@edsantiago edsantiago force-pushed the bats-parallel branch 2 times, most recently from 762e878 to a195865 Compare June 10, 2024 12:30
@edsantiago edsantiago force-pushed the bats-parallel branch 2 times, most recently from 4d4e9c9 to e653a16 Compare June 27, 2024 14:00
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume once we enabled parallel runs here we can port that over for the bud tests on podman? I like to get the speed up there too.

I haven't looked deeply into the prefetch logic changes but this looks much easier than podman so that is good.

@@ -19,4 +19,4 @@ function execute() {
TESTS=${@:-.}

# Run the tests.
execute time bats --tap $TESTS
execute time bats -j 4 --tap $TESTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use $(nproc) here instead of hard coding any count, should make it much faster when run locally.

Copy link

A friendly reminder that this PR had no activity for 30 days.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 10, 2024

(A brief look after a link from containers/skopeo#2437 )

zstd found even though push was without --force-compression: My first guess is that the FROM alpine; _EOF - built image (via build caching?) matches an image created in some other concurrently running test, and that causes BlobInfoCache to have a record about the zstd-compressed layer version.

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 10, 2024

… oh, and: debug-level logs could help.

@edsantiago
Copy link
Member Author

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

Thank you. I thought I had checked for conflicts, but must've missed something. I'll look into this again when time allows.

@edsantiago edsantiago force-pushed the bats-parallel branch 5 times, most recently from d17cb1b to 68722ca Compare November 4, 2024 13:35
@edsantiago
Copy link
Member Author

push test still flaking, and I'm giving up for the day. It is stumping me:

not ok 50 bud: build push with --force-compression

# # buildah build ...
...
# f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263
# # buildah push [...]--compression-format gzip img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # buildah push [...] --compression-format zstd --force-compression=false img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # skopeo inspect img-t50-nesrbrf5
# {"schemaVersion":2,"mediaType":"application/vnd.oci.image.manifest.v1+json","config":{"mediaType":"application/vnd.oci.image.config.v1+json","digest":"sha256:f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263","size":524},"layers":[{"mediaType":"application/vnd.oci.image.layer.v1.tar+zstd","digest":"sha256:05d36d22e1ad1534254c6965a3b43cf39f4dca9d5c95551eccf40108f076da2b","size":146}],"annotations":{"org.opencontainers.image.base.digest":"","org.opencontainers.image.base.name":""}}
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: zstd found even though push was without --force-compression
# #| expected: !~ 'zstd'

Where is that 05d36 layer coming from?

@mtrmac
Copy link
Contributor

mtrmac commented Nov 5, 2024

I can’t see any obvious reason. I’d suggest doing the pushes with --log-level=debug, that should include traces like Trying to reuse blob

@flouthoc flouthoc force-pushed the bats-parallel branch 2 times, most recently from e20800d to 42d2266 Compare January 24, 2025 23:13
@flouthoc
Copy link
Collaborator

Looks like last few commits fix the flake caused due to parallelize, I am gonna push a few times more.

@flouthoc flouthoc force-pushed the bats-parallel branch 2 times, most recently from 058e0e4 to ebad7b2 Compare January 25, 2025 05:27
All bats tests run with custom root/runroot, so it should be
possible to parallelize them.

Signed-off-by: Ed Santiago <[email protected]>
Signed-off-by: flouthoc <[email protected]>
flouthoc added a commit to edsantiago/buildah that referenced this pull request Jan 25, 2025
Address comment here containers#5552 (comment)

Signed-off-by: flouthoc <[email protected]>
@flouthoc flouthoc marked this pull request as ready for review January 25, 2025 16:11
flouthoc added a commit to edsantiago/buildah that referenced this pull request Jan 25, 2025
Address comment here containers#5552 (comment)

Signed-off-by: flouthoc <[email protected]>
Address comment here for cleanup containers#5552 (comment)

Signed-off-by: flouthoc <[email protected]>
@flouthoc
Copy link
Collaborator

flouthoc commented Jan 25, 2025

Okay following PR is ready for review. Flakes which Ed found should are fixed in above commits.

I found one more flake but that is unrelated to changes in this PR and exists in upstream ( https://cirrus-ci.com/task/6143954845958144 ) sample PR in upstream #5943

@containers/buildah-maintainers PTAL

@flouthoc flouthoc removed the stale-pr label Jan 25, 2025
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Containerized IntegrationSuccessful in 53m

Looks like this does not bump the VM for the Containerized test so that run is still slow.
The in_podman_task in cirrus.yml would need to get the same gce_instance fix like the other integration tasks

LGTM otherwise

@flouthoc
Copy link
Collaborator

Containerized IntegrationSuccessful in 53m

Looks like this does not bump the VM for the Containerized test so that run is still slow. The in_podman_task in cirrus.yml would need to get the same gce_instance fix like the other integration tasks

LGTM otherwise

@Luap99 Yes I will open a seperate PR for containerized_integration, just wanted to get ed's PR in for this one.

@flouthoc
Copy link
Collaborator

@rhatdan @nalind @TomSweeneyRedHat PTAL

@rhatdan
Copy link
Member

rhatdan commented Jan 27, 2025

/approve
/lgtm

Copy link
Contributor

openshift-ci bot commented Jan 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 042414a into containers:main Jan 27, 2025
32 checks passed
@flouthoc
Copy link
Collaborator

Created a followup PR: #5947

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants