Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tmpfs for integration tests #5959

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

flouthoc
Copy link
Collaborator

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None 

@flouthoc flouthoc force-pushed the integrate-experiment branch from 0f7b48c to 1c2b97f Compare January 30, 2025 20:20
@flouthoc flouthoc marked this pull request as draft January 30, 2025 21:16
@flouthoc flouthoc force-pushed the integrate-experiment branch 2 times, most recently from 7ad3de4 to df76c15 Compare January 31, 2025 16:50
@flouthoc
Copy link
Collaborator Author

@Luap99 It seems increasing CPU for vfs has some improvement.

@flouthoc
Copy link
Collaborator Author

Also containerized_integration came from 27 something to 19m

@Luap99
Copy link
Member

Luap99 commented Feb 3, 2025

image

cirrus graph for the container task seems broken as this usage makes no sense.

image

Looking at another int test we do not seem max out the cpu much so I am not sure if 8 cores make much sense. Also memory usage seems quite low so I think we can reduce that as well.

Overall the first question is what is the goal here? 30 mins total CI like podman? What is the target time we are aiming for?
Once we know that we can see how much we want to bump the cores, because we also need to keep in mind of costs. If double the cores means 2/3 of the time then the costs will most likely be higher.

@flouthoc
Copy link
Collaborator Author

flouthoc commented Feb 3, 2025

Overall the first question is what is the goal here? 30 mins total CI like podman

I think we are already under 30mins, this PR was just an experiment. Increasing cores reduces time by a lot but yeah I don't think it's a good idea to keep throwing money at the problem if its not needed.

@Luap99
Copy link
Member

Luap99 commented Feb 3, 2025

Overall the first question is what is the goal here? 30 mins total CI like podman

I think we are already under 30mins, this PR was just an experiment. Increasing cores reduces time by a lot but yeah I don't think it's a good idea to keep throwing money at the problem if its not needed.

I am talking about total time for CI to finish. Not just a single task time. There is a bit of overhead in scheduling tasks.

You need to look at the cirrus build page at the top it shows you Finished in 57:37 on this PR (because it was not rebased on the unit test speedup)

@flouthoc
Copy link
Collaborator Author

flouthoc commented Feb 3, 2025

Overall the first question is what is the goal here? 30 mins total CI like podman

I think we are already under 30mins, this PR was just an experiment. Increasing cores reduces time by a lot but yeah I don't think it's a good idea to keep throwing money at the problem if its not needed.

I am talking about total time for CI to finish. Not just a single task time. There is a bit of overhead in scheduling tasks.

You need to look at the cirrus build page at the top it shows you Finished in 57:37 on this PR (because it was not rebased on the unit test speedup)

Yes I am only talking about integration tests, waiting for this to merge for unit tests here #5954

@flouthoc flouthoc force-pushed the integrate-experiment branch 5 times, most recently from 5020a4b to c7dbcd8 Compare February 3, 2025 18:23
@flouthoc
Copy link
Collaborator Author

flouthoc commented Feb 3, 2025

@Luap99 I see minor improvement in other tests but not in containerized_integration.

@flouthoc flouthoc force-pushed the integrate-experiment branch 4 times, most recently from e548e62 to fe0750f Compare February 6, 2025 14:56
@flouthoc flouthoc marked this pull request as ready for review February 6, 2025 14:56
@flouthoc flouthoc changed the title time integration tests Use tmpfs for integration tests and bump resources Feb 6, 2025
@flouthoc
Copy link
Collaborator Author

flouthoc commented Feb 6, 2025

@Luap99 @nalind PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the tmpfs change actually do anything? AFAICT we set TMPDIR=/var/tmp so nothing ends up on tmpfs?

In general a PR like this benefits from precise numbers before/after. A commit like "bump cpu and memory" is not helping any future reader. It is missing the why we do this and the numbers on much we safe.

contrib/cirrus/lib.sh Outdated Show resolved Hide resolved
contrib/cirrus/lib.sh Outdated Show resolved Hide resolved
contrib/cirrus/lib.sh Outdated Show resolved Hide resolved
contrib/cirrus/lib.sh Outdated Show resolved Hide resolved
contrib/cirrus/setup.sh Outdated Show resolved Hide resolved
.cirrus.yml Outdated Show resolved Hide resolved
@Luap99
Copy link
Member

Luap99 commented Feb 7, 2025

Actually never mind we only set TMPDIR: '/var/tmp' on the conformance vfs task per 8b0ecd7

So I think the tests were already using tmpfs

@flouthoc flouthoc force-pushed the integrate-experiment branch from fe0750f to 4547c96 Compare February 7, 2025 20:01
@flouthoc flouthoc force-pushed the integrate-experiment branch 28 times, most recently from 5d1e8f1 to e4b5009 Compare February 15, 2025 04:46
Add `run_with_log` to mkcw tests.

Add `sleep 1` during cleanup between attempting `luksClose`
and unmounting the filesystem mounted on the device /dev/mapper/"$uuid".
Without this somehow we end up in a state where mount is still being
used by the kernel because when we do `lsof /dev/mapper/"$uuid"` it
shows nothing but `dmsetup info -c $uuid` shows the device is still
under use. Adding `sleep 1` in between somehow fixes this.

Also this problem with `cryptsetup` is pretty common for reference
one thread which I found https://lore.kernel.org/all/[email protected]/T/

Signed-off-by: flouthoc <[email protected]>
@flouthoc flouthoc force-pushed the integrate-experiment branch from e4b5009 to e1563d0 Compare February 15, 2025 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants