Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't crash in cache refresh/update with nil fsnotify watcher. #254

Merged
merged 3 commits into from
Feb 24, 2025

Conversation

klihub
Copy link
Contributor

@klihub klihub commented Feb 22, 2025

Don't crash if fsnotify.Watcher creation fails, for instance due to being out of file descriptors. See moby/buildkit#5767 for an example.

Fixes #253.

// (but with autoRefresh left on). One known case when this can happen is
// if we have too many open files. In that case we always return true and
// force a refresh.
if w.watcher == nil {
Copy link
Contributor

@bart0sh bart0sh Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check it earlier, e.g. when w.watcher is about to be assigned to nil? Or we want to keep it nil in a hope that it will be changed at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly we guard setting the watchers in a Mutex. Does that help to ensure this does not happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check it earlier, e.g. when w.watcher is about to be assigned to nil? Or we want to keep it nil in a hope that it will be changed at some point?

Yes, I wanted to leave the door open for a better than the current recovery once we can again create fds (and that would probably be retrying a setup here).

In this very PR, I wanted to roll a minimum-footprint fix that we can review and merge quickly, and be fairly confident that it is safe to cherry pick and tag a v0.8.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly we guard setting the watchers in a Mutex. Does that help to ensure this does not happen?

This happens if we hit EMFILE while trying to create the watch.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub.

I think the changes look good in principle.

@bart0sh feel free to merge once you're happy so that we can create a branch and cherry-pick this change and possibly the mount ordering one.

@klihub klihub force-pushed the fixes/refresh-sigsegv-with-nil-watcher branch from 73b86d0 to ada455e Compare February 24, 2025 09:03
Don't crash in update() if we fail to create an fsnotify watch.
This can happen if we have too many open files. In this case we
now record a failure for all configured spec directories and in
update we always trigger a refresh. If the process if ever able
to create new file descriptors the cache becomes functional but
in a 'always implicitly fully refreshed' mode instead of auto-
refreshed.

It's not entirely clear what is the best option to deal with a
failed watch creation. Being out of file descriptors typically
results in a cascading chain of errors which the process does
not usually survive.

This fix aims for minimal footprint. On failed watch creation
it does not render the cache fully unusable. If the process is
ever able to create new file descriptors again the cache also
becomes functional, but instead of autorefreshed mode it will
be in an 'always implicitly fully refreshed' mode.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the fixes/refresh-sigsegv-with-nil-watcher branch from ada455e to f674bc0 Compare February 24, 2025 09:51
@klihub klihub requested a review from bart0sh February 24, 2025 09:54
Test that the cache indeed recovers functionally if the process
ever recovers from not being able to create new file descriptors.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the fixes/refresh-sigsegv-with-nil-watcher branch from f674bc0 to 1de2560 Compare February 24, 2025 09:57
@bart0sh bart0sh merged commit a582593 into cncf-tags:main Feb 24, 2025
8 checks passed
@klihub klihub deleted the fixes/refresh-sigsegv-with-nil-watcher branch February 24, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants