Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem: try umount first during stop-action, and avoid potential "Argument list too long" for force_unmount=safe #1977

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

lge
Copy link
Member

@lge lge commented Sep 18, 2024

See individual commit messages.

  • fix stop: try umount first, and only if that fails try to find users of that file system
  • in the "safe" mode of get_pids(), avoid a potential "Argument list too long" problem

lge added 2 commits September 18, 2024 11:53
48ed6e6 (Filesystem: improve stop-action and allow setting term/kill signals and signal_delay for large filesystems, 2023-07-04)
changed the logic from
"try umount; if that fails, find and kill users; repeat" to
"try to find and kill users; then try umount; repeat"

But even just walking /proc may take "a long time" on busy systems,
and may still turn up with "no users found".

It will take even longer for "force_umount=safe"
(observed 8 to 10 seconds just for "get_pids() with "safe" to return nothing)
than for "force_umount=yes" (still ~ 2 to 3 seconds),
but it will take "a long time" in any case.
(BTW, that may be longer than the hardcoded default of 6 seconds for "fast_stop",
which is also the default on many systems now)

If the dependencies are properly configured,
there should be no users left,
and the umount should just work.

Revert back to "try umount first", and only then try to find "rogue" users.
The "safe" way to get process ids that may be using a particular filesystem
currently uses shell globs ("find /proc/[0-9]*").
With a million processes (and/or a less capable shell),
that may result in "Argument list too long".

Replace with find /proc -path "/proc/[0-9]*" instead.
While at it, also fix the non-posix -or to be -o,
and add explicit grouping parentheses \( \) and explicit -print.

Add a comment to not include "interesting" characters in mount point names.
@lge lge requested a review from oalbrigt September 18, 2024 12:06
Copy link
Contributor

@oalbrigt oalbrigt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@SatomiOSAWA
Copy link
Contributor

Hi, @lge @oalbrigt

Impressive work!
Thank you for your consideration.
I understand that the issue with RHBZ:2207567 is serious.
However, as Mr. Lars noted, if the dependencies are properly configured,
it seems unlikely that unmounting first would cause any problems.
I have come to recognize the importance of adding proper comments once again.
I believe many users will appreciate this improvement.

@oalbrigt oalbrigt changed the title Filesystem fix stop try umount first Filesystem: try umount first during stop-action, and avoid potential "Argument list too long" for force_unmount=safe Sep 19, 2024
@oalbrigt oalbrigt merged commit a2045f8 into ClusterLabs:main Sep 19, 2024
1 check passed
@oalbrigt
Copy link
Contributor

Thanks.

@lge
Copy link
Member Author

lge commented Sep 19, 2024

if scanning /proc twice should become a problem, we could either use bash and its more advance redirections, or use fifos

procs=$(
    exec 2>/dev/null
    D=$(mktemp -d) # (preferably somwhere on a tmpfs, so set TMPDIR or add -p or an explicit template)
    mkfifo "$D/maps";
    (
    < "$D/maps" xargs -r grep -l " $dir/"  | cut -d/ -f3 | uniq &
    exec 3> "$D/maps";
    rm -rf "$D";
    find /proc -path '/proc/[0-9]*' \
        \( -name maps -fprint "/dev/fd/3" \
        -o -type l \( -lname "$dir/*" -o -lname "$dir" \) -print \
        \) | cut -d/ -f 3 | uniq
    ) | sort -u
)

😜

@oalbrigt
Copy link
Contributor

Feel free to make a patch when you got the time.

We should keep it non-bash specific, as the agent can also be used on *BSD.

@lge lge deleted the Filesystem-fix-stop-try-umount-first branch September 19, 2024 09:20
@lge
Copy link
Member Author

lge commented Sep 19, 2024

Feel free to make a patch when you got the time.

We should keep it non-bash specific, as the agent can also be used on *BSD.

#1978 (hope that is sufficiently *BSD compatible ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants