Skip to content

pkg/fswatcher: Rewrite without underlying use of fsnotify #38537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 4, 2025

Conversation

glibsm
Copy link
Member

@glibsm glibsm commented Mar 26, 2025

Remove reliance on fsnotify by replacing with a polling mechanism to check for modifications to tracked files.

There is a default interval that is configurable depending on the usage, and it is also significantly shortened when running tests.

Superseeds #37909
Fixes #25666


The original test cases were kept pretty much as-is, except they are now deterministic (at least so far on the mac)

❯ go test -count=100 -race ./pkg/fswatcher
ok      github.com/cilium/cilium/pkg/fswatcher  1.893s

@glibsm glibsm added kind/enhancement This would improve or streamline existing functionality. area/CI Continuous Integration testing issue or flake release-note/misc This PR makes changes that have no direct user impact. area/hubble Impacts hubble server or relay labels Mar 26, 2025
@glibsm glibsm requested review from a team as code owners March 26, 2025 20:42
@glibsm glibsm requested review from joamaki, glrf, ldelossa, gandro and kaworu and removed request for glrf March 26, 2025 20:42
@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch 3 times, most recently from 8ee23e6 to ea63121 Compare March 27, 2025 18:04
kaworu
kaworu previously requested changes Mar 28, 2025
@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch 2 times, most recently from 7f7d4a2 to c6cdbde Compare April 16, 2025 03:07
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Given there are still a few unknowns w.r.t. symlinks in the logic, have you tested this in a life K8s cluster and checked if it correctly detected updates to a ConfigMap/Secret mount?

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested this in a life K8s cluster and checked if it correctly detected updates to a ConfigMap/Secret mount?

IIRC you need to project the ConfigMap/Secret for k8s to do the weird symlink dance, like we do for e.g. the Hubble server TLS cert.

@glibsm
Copy link
Member Author

glibsm commented May 2, 2025

I did some local testing to see what happens when a secret is projected to filesystem and is then subsequently updated.

I deployed a basic pod which mounts a secret:

...
    spec:
      containers:
        - name: sleeper
          image: busybox
          command: ["sleep", "3600"]
          volumeMounts:
            - name: projected-secrets
              mountPath: "/tmp/projected-secrets"
              readOnly: true
      volumes:
        - name: projected-secrets
          projected:
            sources:
              - secret:
                  name: my-secret

And the resulting structure looks like this with the original key apikey123

$ /tmp/projected-secrets # ls -al
total 4
drwxrwxrwt    3 root     root           140 May  2 16:57 .
drwxrwxrwt    1 root     root          4096 May  2 16:57 ..
drwxr-xr-x    2 root     root           100 May  2 16:57 ..2025_05_02_16_57_16.1478340719
lrwxrwxrwx    1 root     root            32 May  2 16:57 ..data -> ..2025_05_02_16_57_16.1478340719
lrwxrwxrwx    1 root     root            14 May  2 16:57 api-key -> ..data/api-key
lrwxrwxrwx    1 root     root            15 May  2 16:57 password -> ..data/password
lrwxrwxrwx    1 root     root            15 May  2 16:57 username -> ..data/username
$ /tmp/projected-secrets # cat -n api-key
     1  apikey123

After I edit the key and a few seconds have passed

$ k edit secret
secret/my-secret edited

The directory structure looks as follows:

$ /tmp/projected-secrets # ls -al
total 4
drwxrwxrwt    3 root     root           140 May  2 17:02 .
drwxrwxrwt    1 root     root          4096 May  2 16:57 ..
drwxr-xr-x    2 root     root           100 May  2 17:02 ..2025_05_02_17_02_00.1481948809
lrwxrwxrwx    1 root     root            32 May  2 17:02 ..data -> ..2025_05_02_17_02_00.1481948809
lrwxrwxrwx    1 root     root            14 May  2 16:57 api-key -> ..data/api-key
lrwxrwxrwx    1 root     root            15 May  2 16:57 password -> ..data/password
lrwxrwxrwx    1 root     root            15 May  2 16:57 username -> ..data/username
$ /tmp/projected-secrets # cat -n api-key
     1  apikey456

To the best of my knowledge you can't edit symlinks, so it looks like when a new directory ..2025_05_02_17_02_00.1481948809 is created, the old symlink ..data is deleted a new one created in it's place.

I will try to add exactly this test case into the fswatcher_test to see what we get.

My gut tells me that new fswatcher should be able to handle this very well since we do os.Stat on the api-key and check the contents of the target (size and modtime) and should be able to detect the change very easily even though there are two symlinks in the way.

@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch from c6cdbde to 6ab81bf Compare May 5, 2025 18:28
@glibsm glibsm requested review from kaworu and gandro May 5, 2025 18:28
@glibsm
Copy link
Member Author

glibsm commented May 5, 2025

Added a new test that mimics the kubernetes secret projection and a subsequent value update -- the fswatcher works for it without any additional changes 🥳

@glibsm glibsm requested a review from kaworu May 23, 2025 18:22
@glibsm
Copy link
Member Author

glibsm commented May 28, 2025

CI error seems like it could be related.

time=2025-05-19T21:17:42Z level=error source=/go/src/github.com/cilium/cilium/pkg/datapath/linux/ipsec/ipsec_linux.go:1186 msg="Failed to load IPsec keyfile" module=agent.datapath.ipsec-key-custodian error="invalid SPI: changing IPSec keys requires incrementing the key id" (1 occurrences)

That code is triggered via fswatcher notifications. Need to look at the test setup, maybe something about the new timing in tests is not working as expected.

@kaworu kaworu added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 28, 2025
@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch from 4aa1566 to 30b7475 Compare May 28, 2025 16:23
@glibsm glibsm removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 28, 2025
@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch from 30b7475 to d23c96f Compare May 28, 2025 21:15
Remove reliance on fsnotify by replacing with a polling
mechanism to check for modifications to tracked files.

There is a default interval that is configurable depending
on the usage, and it is also significantly shortened when
running tests.

Signed-off-by: Glib Smaga <code@gsmaga.com>
@glibsm glibsm force-pushed the pr/glibsm/rewrite-fswatcher branch from d23c96f to f4c1205 Compare May 29, 2025 15:52
@glibsm
Copy link
Member Author

glibsm commented May 29, 2025

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 4, 2025
@rolinh rolinh added this pull request to the merge queue Jun 4, 2025
Merged via the queue into main with commit dd79107 Jun 4, 2025
354 of 357 checks passed
@rolinh rolinh deleted the pr/glibsm/rewrite-fswatcher branch June 4, 2025 13:44
@gandro
Copy link
Member

gandro commented Jun 4, 2025

Should we backport this?

@glibsm
Copy link
Member Author

glibsm commented Jun 4, 2025

Should we backport this?

I think it's worth backporting, yeah. It should help CI health among other things. Backport may not be clean depending on the status of the slog migration PR which caused a rebase conflict halfway through this PR

@glibsm glibsm added needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Jun 4, 2025
@rastislavs rastislavs mentioned this pull request Jun 9, 2025
2 tasks
@rastislavs rastislavs added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Jun 9, 2025
@rastislavs rastislavs mentioned this pull request Jun 9, 2025
1 task
@rastislavs rastislavs added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Jun 9, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/hubble Impacts hubble server or relay backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: github.com/cilium/cilium/pkg/fswatcher timeout panic
7 participants