Skip to content

Conversation

gandro
Copy link
Member

@gandro gandro commented Nov 11, 2024

  • datapath,endpoint: explicitly remove TC filters during endpoint teardown #32167 (@ti-mo) ⚠️ resolved conflicts
    • No removing of TCX hooks in Loader.Unload, as v1.14 does not need to support downgrading from v1.16. This basically means that this part here is not backported:
      // If Cilium and the kernel support tcx to attach TC programs to the
      // endpoint's veth device, its bpf_link object is pinned to a per-endpoint
      // bpffs directory. When the endpoint gets deleted, we can remove the whole
      // directory to clean up any leftover pinned links and maps.
      // Remove the links directory first to avoid removing program arrays before
      // the entrypoints are detached.
      if err := bpf.Remove(bpffsEndpointLinksDir(bpf.CiliumPath(), ep)); err != nil {
      log.WithError(err)
      }
      // Finally, remove the endpoint's top-level directory.
      if err := bpf.Remove(bpffsEndpointDir(bpf.CiliumPath(), ep)); err != nil {
      log.WithError(err)
      }
    • Fixed conflicts in pkg/endpoint/endpoint.go due to the fact that v1.14 uses !option.Config.DryMode rather than
      !e.isProperty(PropertyFakeEndpoint)
    • Minor conflicts in Endpoint.deleteMaps due to BW manager map removal logic being slightly different.
    • Minor conflicts due to bpf.RemoveTCFilters being an exported function in v1.14 (where as it it private in v1.15+).

Once this PR is merged, a GitHub action will update the labels of these PRs:

 32167

@gandro gandro added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Nov 11, 2024
@gandro gandro requested a review from ti-mo November 11, 2024 12:49
[ upstream commit 6633ca8 ]
[ backporter notes:

  - No removing of TCX hooks in `Loader.Unload`, as v1.14 does not
    need to support downgrading from v1.16.
  - Fixed conflicts in pkg/endpoint/endpoint.go due to the fact that
    v1.14 uses `!option.Config.DryMode` rather than
    `!e.isProperty(PropertyFakeEndpoint)`
  - Minor conflicts in `Endpoint.deleteMaps` due to BW manager map
    removal being slightly different.
  - Minor conflicts due to `bpf.RemoveTCFilters` being an exported
    function in v1.14 (where as it it private in v1.15+).

]

Prior to this commit, we left it up to the kernel to clean up tc attachments
when the CNI finally removes the veth when a Pod goes away. This leaves a window
of time where an endpoint's tc programs can potentially be invoked after
the endpoint's internal tail call maps have already been cleared and the
endpoint has been removed from the endpoint map and ipcache, resulting in
undefined behaviour.

This patch clearly defines the endpoint teardown sequence as follows:
- remove (endpoint) routes
- set the interface down
- detach tc(x) hooks
- remove endpoint from endpoint map
- remove endpoint policy program(s)
- delete conntrack map pins
- remove policy prog array map pin
- remove internal tail call map pin
- remove custom calls map pin

This puts the agent more in control of the teardown sequence and will allow us
to reason better about failures related to missed tail calls and other flakes.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/v1.14-backport-2024-11-11-01-36 branch from 17766b6 to b0402ac Compare November 11, 2024 12:50
@gandro gandro changed the title v1.14 Backports 2024-11-11 [v1.14] datapath,endpoint: explicitly remove TC filters during endpoint teardown Nov 11, 2024
@gandro gandro marked this pull request as ready for review November 11, 2024 13:04
@gandro gandro requested a review from a team as a code owner November 11, 2024 13:04
@gandro
Copy link
Member Author

gandro commented Nov 11, 2024

/test-backport-1.14

@gandro
Copy link
Member Author

gandro commented Nov 11, 2024

Marking this as a release blocker since this is fixing a CI issue in the v1.15 downgrade tests

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict resolution changes makes sense. Looks good to me

@gandro gandro merged commit 9bdf99c into cilium:v1.14 Nov 11, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants