Skip to content

v1.10 backports 2022-06-08 #20110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2022

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Jun 8, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 19565 19923; do contrib/backporting/set-labels.py $pr done 1.10; done

or with

$ make add-label BRANCH=v1.10 ISSUES=19565,19923

@joamaki joamaki requested a review from a team as a code owner June 8, 2022 12:34
@joamaki joamaki requested review from geakstr and pchaigno June 8, 2022 12:34
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Jun 8, 2022
geakstr and others added 3 commits June 8, 2022 14:36
[ upstream commit cb6c554 ]

Previously we used envoy proxy container to convert grpc-web traffic to
true grpc traffic, so ui backend accepts real grpc traffic. There is
an alternative approach to use special wrapper
for grpc service on backend:
https://github.com/improbable-eng/grpc-web/tree/master/go/grpcweb

Related code changes on hubble-ui itself was implemented in this PR:
cilium/hubble-ui#226

Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 6a1f175 ]

Currently, when the cilium-operator attaches new ENIs to a node, we
update the corresponding CiliumNode in two steps: first the .Status,
then the .Spec [1]. That can result in an inconsistent state, where the
CiliumNode .Spec.IPAM.Pool contains new IP addresses associated with the
new ENI, while .Status.ENI.ENIs is still missing the ENI. This
inconsistency manisfests as a fatal:

    level=fatal msg="Error while creating daemon" error="Unable to allocate router IP for family ipv4: failed to associate IP 10.12.14.5 inside CiliumNode: unable to find ENI eni-9ab538c64feb9f59e" subsys=daemon

This inconsistency occurs because the following can happen:
1. cilium-operator attaches a new ENI to the CiliumNode.
2. Still at cilium-operator, .Spec is synced with kube-apiserver. The IP
   pool is updated with a new set of IP addresses and the new ENI.
3. The agent receives this half-updated CiliumNode.
4. It allocates an IP address for the router from the pool of IPs
   attached to the new ENI, using .Spec.IPAM.Pool.
5. It fails because the new ENI is not listed in the .Status.ENI.ENIs of
   the CiliumNode object.
6. At cilium-operator, .Status is updated with the new ENI.

But wait, you said .Status is updated before .Spec in the function you
linked? Yes, but we read the state to populate CiliumNode from two
separate places (n.ops.manager.instances and n.available) in the
syncToAPIServer function and we don't have anything to prevent having a
half updated (one place only) state in the middle of the update function.
We lock twice, once for each place, instead of once for the while
CiliumNode update. So having a half updated state in the middle of the
function would technically be the same as updating .Spec first and
.Status second.

We can fix this by first creating a snapshot of the pool first, then
write the .Status metadata (which may be more recent than the pool
snapshot, which is safe, see comment in the source code of this patch),
and then write the pool to .Spec. This ensures that the .Status is
always updated before .Spec, but at the same time also ensures that
.Status is still more recent than .Spec.

1 - https://github.com/cilium/cilium/blob/v1.12.0-rc2/pkg/ipam/node.go#L966-L1012

Co-authored-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit eac0dee ]

The `node.Spec.IPAM.Pool` value is always overwritten after the removed
`if` statement, so there is no need to initialize it.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/v1.10-backport-2022-06-08 branch from a491984 to 572e21f Compare June 8, 2022 12:36
Copy link
Contributor

@geakstr geakstr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joamaki
Copy link
Contributor Author

joamaki commented Jun 8, 2022

/test-backport-1.10

Job 'Cilium-PR-K8s-1.19-kernel-5.4' hit: #17069 (95.55% similarity)

Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks service across nodes with L7 policy Tests NodePort with L7 Policy

Failure Output

FAIL: Request from k8s1 to service http://[fd03::32e9]:10080 failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 so I can create one.

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the commit for @pchaigno which was taken over by me.

@gandro gandro removed the request for review from pchaigno June 8, 2022 15:16
@nebril
Copy link
Member

nebril commented Jun 9, 2022

/test-1.19-4.9

@nebril
Copy link
Member

nebril commented Jun 9, 2022

/test-1.19-5.4

@joestringer
Copy link
Member

Tests are passing, reviews are in, merging.

@joestringer joestringer merged commit 73f2028 into cilium:v1.10 Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants