-
Notifications
You must be signed in to change notification settings - Fork 3.4k
envoy/xds: Await until endpoint restoration is done #32824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
envoy/xds: Await until endpoint restoration is done #32824
Conversation
/test |
put back in draft, will have to test restarts when endpoints have L7 policies; it is possible that regeneration now hangs in that case as envoy updates are postponed. |
Do we need to update cilium-envoy to the latest to pick up cilium/proxy#785? |
I know the context from which this comes -- wanting to ensure Envoy reloads ipcache and doesn't replace policies with an empty set -- but it would be good to identify which sorts of churn this does and does not prevent. What does endpoint restoration provide that Envoy consumes? As endpoints are regenerated, are we going to see policy drops for non-regenerated endpoints? I assume so, since their policy is not yet generated and thus not in the xDS server. But not sure what we can do about that yet. |
272cd92
to
a2682be
Compare
/test |
This comment was marked as outdated.
This comment was marked as outdated.
Endpoint restoration waits for
We now delay sending any new resources to Envoy, so all existing (and most new) connections should keep going based on the previous policy, both in bpf datapath as well as in Envoy. There is a caveat with new dns proxy-dependent connection in the window between endpoint bpf having been regenerated and us waiting on ALL endpoints having been regenerated before sending updated network policies to Envoy. The updated policy would include newly allocated FQDN IDs. |
@jrajahalme no, that's endpoint regeneration. Restoration happens much earlier, even before the k8s clients have started. Regeneration is what re-creates policies and proceeds only after the daemon has initialized. Network policies are, indeed, ingested before regeneration, but regeneration is what converts them to Envoy configuration. |
a2682be
to
ff3b4a1
Compare
My read of the following is that initial k8s state is fetched before endpoints are restored:
Endpoints are regenerated during restoration, so whatever happens during endpoint regeneration also happens during endpoint restoration. |
/test |
ff3b4a1
to
397cf70
Compare
/test |
397cf70
to
a23c363
Compare
a62ba38
to
f800ea5
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes Jarno! Only one non-blocking input/question.
Wait until endpoint restoration is done before serving any xDS resources to external (daemonset) Envoy on Cilium restart. This reduces Envoy resource churn during agent restart.
Add Cell to xds so that it can depend on the promise directly.
With this we only get one no-op policy update in Envoy: