Skip to content

clustermesh: fix config watcher missed events with fsnotify 1.8.0 #35770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Nov 5, 2024

The clustermesh configuration watcher needs to explicitly watch the config files to receive a notification when the underlying file gets updated, if the path points to a symbolic link (e.g., when the folder is mounted from a Kubernetes ConfigMap/Secret). However, the fsnotify library slightly changed its behavior in the latest release [1], and no longer propagates a remove event in case the parent directory is watched as well. The reason being to prevent duplicate events, as it would be generated both by the specific watcher and the parent directory watcher. Still, the parent watcher doesn't emit the remove event in this context, given that the symbolic link is not actually removed. In turn, opening the possibility for race conditions in the current event processing logic, which depends on always correctly re-establishing the watcher.

Let's address this by using two separate watchers, one for the config directory itself and one for the individual config files, to ensure that the remove event is correctly emitted when the symbolic link starts pointing to a different file (hence breaking the existing watcher), so that we can re-establish it.

[1]: fsnotify/fsnotify#620

The clustermesh configuration watcher needs to explicitly watch the
config files to receive a notification when the underlying file gets
updated, if the path points to a symbolic link (e.g., when the folder
is mounted from a Kubernetes ConfigMap/Secret). However, the fsnotify
library slightly changed its behavior in the latest release [1], and no
longer propagates a remove event in case the parent directory is watched
as well. The reason being to prevent duplicate events, as it would be
generated both by the specific watcher and the parent directory watcher.
Still, the parent watcher doesn't emit the remove event in this context,
given that the symbolic link is not actually removed. In turn, opening
the possibility for race conditions in the current event processing
logic, which depends on always correctly re-establishing the watcher.

Let's address this by using two separate watchers, one for the config
directory itself and one for the individual config files, to ensure that
the remove event is correctly emitted when the symbolic link starts
pointing to a different file (hence breaking the existing watcher), so
that we can re-establish it.

[1]: fsnotify/fsnotify#620

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Nov 5, 2024
@giorio94
Copy link
Member Author

giorio94 commented Nov 5, 2024

/test

@giorio94
Copy link
Member Author

giorio94 commented Nov 5, 2024

Ci is reasonably green (failures are unrelated flakes). Dropping the extra commit.

@giorio94 giorio94 force-pushed the mio/fix-config-watcher-for-fsnotify-update branch from c2c6c38 to 5792b46 Compare November 5, 2024 12:42
@giorio94
Copy link
Member Author

giorio94 commented Nov 5, 2024

/test

1 similar comment
@giorio94
Copy link
Member Author

giorio94 commented Nov 5, 2024

/test

@giorio94 giorio94 marked this pull request as ready for review November 5, 2024 12:43
@giorio94 giorio94 requested a review from a team as a code owner November 5, 2024 12:43
@giorio94 giorio94 requested a review from jrajahalme November 5, 2024 12:43
@giorio94
Copy link
Member Author

giorio94 commented Nov 5, 2024

/test

@aanm aanm enabled auto-merge November 6, 2024 16:44
@giorio94
Copy link
Member Author

giorio94 commented Nov 8, 2024

@jrajahalme Gentle ping 🙏

@aanm aanm disabled auto-merge November 14, 2024 10:31
@aanm aanm merged commit bedd779 into cilium:main Nov 14, 2024
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants