Skip to content

Conversation

aanm
Copy link
Member

@aanm aanm commented Nov 12, 2022

When there is an annotation in the k8s node object, the annotation io.cilium.network.ipv4-cilium-host is used as the CiliumInternal IP address of the CiliumNode object in [1]. Whenever Cilium is updating any state into the CiliumNode it retrieves all IP address from k8s node, including the ones from annotations, and appends the local node's IP addresses, including the newly correct internal / router IP address, in [2]. Since this is a list, the annotation's IP address is always used first and all other Cilium agents will wrongly use it for any operation.

[1]

// Get the addresses from k8s node and add them as part of Cilium Node.
// Cilium Node should contain all addresses from k8s.
nodeInterface := k8s.ConvertToNode(k8sNode)
typesNode := nodeInterface.(*k8sTypes.Node)
k8sNodeParsed := k8s.ParseNode(typesNode, source.Unspec)
k8sNodeAddresses = k8sNodeParsed.IPAddresses

[2]
ciliumNodeAddress := address.IP.String()
var found bool
for _, nodeResourceAddress := range nodeResource.Spec.Addresses {
if nodeResourceAddress.IP == ciliumNodeAddress {
found = true
break
}
}
if !found {
nodeResource.Spec.Addresses = append(nodeResource.Spec.Addresses, ciliumv2.NodeAddress{
Type: address.Type,
IP: ciliumNodeAddress,
})
}
}

Fixes: 73d6cae ("install: default AnnotateK8sNode to false")
Signed-off-by: André Martins andre@cilium.io

Fix bug that could lead to inconsistent pod IP information between agents, sometimes leading to a failure to decrypt IPsec traffic.

Fixes #21735

@aanm aanm added area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. dont-merge/needs-release-note release-note/bug This PR fixes an issue in a previous release of Cilium. labels Nov 12, 2022
@aanm aanm requested review from pchaigno and christarazi November 12, 2022 01:15
@aanm aanm requested a review from a team as a code owner November 12, 2022 01:15
@aanm
Copy link
Member Author

aanm commented Nov 12, 2022

/test

@aanm
Copy link
Member Author

aanm commented Nov 14, 2022

/test

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: this pull request doesn't fix the bug for when annotate-k8s-node=true.

@aanm
Copy link
Member Author

aanm commented Nov 14, 2022

@pchaigno although it doesn't fix for the annotate-k8s-node=true, it does fix for the annotate-k8s-node=false. Regarding the case annotate-k8s-node=true it should be done in a different PR.

aanm added 2 commits November 18, 2022 11:01
When there is an annotation in the k8s node object, the annotation
`io.cilium.network.ipv4-cilium-host` is used as the CiliumInternal IP
address of the CiliumNode object in [1]. Whenever Cilium is updating any
state into the CiliumNode it retrieves all IP address from k8s node,
including the ones from annotations, and appends the local node's IP
addresses, including the newly correct internal / router IP
address, in [2]. Since this is a list, the annotation's IP address is
always used first and all other Cilium agents will wrongly use it for
any operation.

[1] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L453-L459
[2] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L474-L489

Fixes: 73d6cae ("install: default AnnotateK8sNode to false")
Signed-off-by: André Martins <andre@cilium.io>
When using CiliumNode, the agent's source of truth should be the agent
itself and not k8s node annotations. Thus we will not use the
annotations for the CiliumInternalIP address when generating a
CiliumNode from the k8s Node resource.

Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm requested a review from a team as a code owner November 18, 2022 10:11
@aanm aanm requested a review from pchaigno November 18, 2022 10:11
@aanm
Copy link
Member Author

aanm commented Nov 18, 2022

/test

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two comments above are still relevant I think. Other than that, looks good to me.

We try to restore the router IP both from the filesystem (first) and
from Kubernetes objects (as a fallback). If the two IP addresses don't
match, we emit a warning.

There is no good reason for this to happen in CI so we should fail the
test if that warning ever shows up. Doing so would have prevented the
flake fixed by the previous commit.

Signed-off-by: Paul Chaignon <paul@cilium.io>
@aanm aanm requested a review from a team as a code owner November 22, 2022 12:24
@aanm aanm requested a review from pchaigno November 22, 2022 12:25
@aanm
Copy link
Member Author

aanm commented Nov 22, 2022

/test

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. and removed dont-merge/needs-release-note labels Nov 22, 2022
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rerun my local reproduction 8 times with only the second commit and it didn't fail. Given my reproduction works with a rate 1/2, note that there's a probably of 1/256 that it doesn't fail even if it's not fixed.

The changes look good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing
6 participants