-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.10 backports 2022-06-08 #20110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-06-08 #20110
Conversation
[ upstream commit cb6c554 ] Previously we used envoy proxy container to convert grpc-web traffic to true grpc traffic, so ui backend accepts real grpc traffic. There is an alternative approach to use special wrapper for grpc service on backend: https://github.com/improbable-eng/grpc-web/tree/master/go/grpcweb Related code changes on hubble-ui itself was implemented in this PR: cilium/hubble-ui#226 Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 6a1f175 ] Currently, when the cilium-operator attaches new ENIs to a node, we update the corresponding CiliumNode in two steps: first the .Status, then the .Spec [1]. That can result in an inconsistent state, where the CiliumNode .Spec.IPAM.Pool contains new IP addresses associated with the new ENI, while .Status.ENI.ENIs is still missing the ENI. This inconsistency manisfests as a fatal: level=fatal msg="Error while creating daemon" error="Unable to allocate router IP for family ipv4: failed to associate IP 10.12.14.5 inside CiliumNode: unable to find ENI eni-9ab538c64feb9f59e" subsys=daemon This inconsistency occurs because the following can happen: 1. cilium-operator attaches a new ENI to the CiliumNode. 2. Still at cilium-operator, .Spec is synced with kube-apiserver. The IP pool is updated with a new set of IP addresses and the new ENI. 3. The agent receives this half-updated CiliumNode. 4. It allocates an IP address for the router from the pool of IPs attached to the new ENI, using .Spec.IPAM.Pool. 5. It fails because the new ENI is not listed in the .Status.ENI.ENIs of the CiliumNode object. 6. At cilium-operator, .Status is updated with the new ENI. But wait, you said .Status is updated before .Spec in the function you linked? Yes, but we read the state to populate CiliumNode from two separate places (n.ops.manager.instances and n.available) in the syncToAPIServer function and we don't have anything to prevent having a half updated (one place only) state in the middle of the update function. We lock twice, once for each place, instead of once for the while CiliumNode update. So having a half updated state in the middle of the function would technically be the same as updating .Spec first and .Status second. We can fix this by first creating a snapshot of the pool first, then write the .Status metadata (which may be more recent than the pool snapshot, which is safe, see comment in the source code of this patch), and then write the pool to .Spec. This ensures that the .Status is always updated before .Spec, but at the same time also ensures that .Status is still more recent than .Spec. 1 - https://github.com/cilium/cilium/blob/v1.12.0-rc2/pkg/ipam/node.go#L966-L1012 Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit eac0dee ] The `node.Spec.IPAM.Pool` value is always overwritten after the removed `if` statement, so there is no need to initialize it. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
a491984
to
572e21f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/test-backport-1.10 Job 'Cilium-PR-K8s-1.19-kernel-5.4' hit: #17069 (95.55% similarity) Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the commit for @pchaigno which was taken over by me.
/test-1.19-4.9 |
/test-1.19-5.4 |
Tests are passing, reviews are in, merging. |
Once this PR is merged, you can update the PR labels via:
or with