Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Oct 10, 2024

Rationalize sequence of events for policy update and applying incremental policy map updates:

  • update proxy network policy right after ConsumeMapChanges in applyPolicyMapChanges so that it is more obvious that the selector cache version for bpf map updates and proxy network policy is the same on (as updated by ConsumeMapChanges)
  • defer applying the incremental changes for the full policy map sync when a new policy has been computed, so that the proxy has had time to configure the new listeners and to apply the new network policy before getting traffic for them.
  • create l7 redirects before incremental updates, so that the proxy ports are available for them

Keep realized redirects in EndpointPolicy:

  • can create redirects before holding endpoint lock, no coordination needed between redirects of different policy instances (old vs. new)
  • Create MapState after redirects so that proxy ports are available during MapState generation removing the need for a second pass to patch up proxy ports
  • Move old redirect removal to happen after proxy ACKs the changes, removing the need for the 1st maps sync, as now the main map sync happens before redirects are removed

@jrajahalme jrajahalme added kind/cleanup This includes no functional changes. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. area/agent Cilium agent related. labels Oct 10, 2024
@jrajahalme jrajahalme requested review from a team as code owners October 10, 2024 19:01
@jrajahalme jrajahalme marked this pull request as draft October 10, 2024 19:01
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch 3 times, most recently from 8ca9f3e to af40d87 Compare October 10, 2024 21:54
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch from af40d87 to 8dbfcf2 Compare October 11, 2024 07:34
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch from 8dbfcf2 to c1efc56 Compare October 11, 2024 13:10
@jrajahalme jrajahalme marked this pull request as ready for review October 11, 2024 13:10
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall change LGTM, I think it simplifies things. I have one comment below as these things tend to be very subtle and just want to ensure we get it right. Typically, I'd expect cleanup label to indicate changes without a functional impact, but this PR does have functional impact, right?

@jrajahalme jrajahalme removed the kind/cleanup This includes no functional changes. label Oct 12, 2024
@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch from c1efc56 to 620666b Compare October 12, 2024 12:23
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch from fe5fc40 to 638aba4 Compare October 21, 2024 10:37
@jrajahalme jrajahalme requested a review from squeed October 21, 2024 10:37
@jrajahalme
Copy link
Member Author

/test

@squeed
Copy link
Contributor

squeed commented Oct 21, 2024

@jrajahalme amazing work! Some stylistic suggestions but this seems pretty correct.

@squeed squeed removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 21, 2024
@squeed squeed disabled auto-merge October 21, 2024 15:44
Move the set of desired redirects to the EndpointPolicy, as it is stable
for any computed EndpointPolicy. If some redirects are missing, e.g., due
to missing named ports or Envoy listeners, the policy will be recomputed
when those are added, and the new policy will then contain also those
redirects.

This also allows each EndpointPolicy instance to have it's own set of
redirects, depending on the selectorPolicy. If we kept a single set of
realized redirects for the endpoint, we'd need to combine the current and
new sets for the redirects to be available when incremental updates are
added both to the current and new EndpointPolicy.

Move adding redirects to happen before the MapState for the new
EndpointPolicy is computed. This way the redirects are available when
needed for MapState generation, removing the need for the second pass to
add the redirects to the MapState. Redirect creation is simplified with
new SelectorPolicy.RedirectFilters() iterator.

Eliminate the (first) bpf policy map sync that was done just before
removing old redirect.  Old redirects are now removed in a finalizer
function, that is run after we get acknowledgements for the newly added
redirects (if any). If there is a NACK instead, then the finalizer
function is not called at all, so there is no need for reverting the
removals.

The first sync could have sent traffic to Envoy while the Listeners or
the NetworkPolicy were not fully updated yet. Now we rely solely on the
bpf policy map sync that is done in regenerateBPF() after the wait for
Envoy configuration updates is completed. At that point Envoy is ready to
take traffic for the new listeners. Also after this point the old
redirects would not receive any traffic any more and they can safely be
removed by the finalizer function called via defer at the end of
regenerateBPF().

Finally, remove old redirects through FinalizeList *after* the new
redirects have been acknowledged by Envoy. Since references to the
redirect ports of those redirects have already been removed from the
policy map, there is no need to wait for an ack for redirect
removal. This allows simplification of the related code as FinalizedList
nor RevertStack are no longer needed for redirect removal.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Now that redirects are added before taking the endpoint lock we can
update DNS restore rules with the new policy rules after the policy
update, rather than from the old rules before the policy rules updates.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the endpoint-redirect-cleanup branch from 638aba4 to 1813047 Compare October 21, 2024 19:22
@jrajahalme jrajahalme enabled auto-merge October 21, 2024 19:23
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme requested a review from squeed October 22, 2024 07:54
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 22, 2024
@squeed
Copy link
Contributor

squeed commented Oct 22, 2024

Approved, thanks!

@jrajahalme jrajahalme added this pull request to the merge queue Oct 22, 2024
Merged via the queue into cilium:main with commit 810f382 Oct 22, 2024
65 checks passed
@jrajahalme jrajahalme deleted the endpoint-redirect-cleanup branch October 22, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Cilium agent related. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants