Skip to content

Conversation

LynneD
Copy link

@LynneD LynneD commented May 15, 2023

Currently we discard CiliumNode updates based on DeepEqual and labels. However DeepEqual is set to ignore Annotations, and the wg-pub-key annotation is used to exchange rotated Wireguard keys.

@LynneD LynneD requested a review from a team as a code owner May 15, 2023 21:21
@LynneD LynneD requested a review from tommyp1ckles May 15, 2023 21:21
@maintainer-s-little-helper
Copy link

Commit 0fc487f01be34f75bf5ae6ad24e2f4b8d6753821 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 15, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 15, 2023
@LynneD
Copy link
Author

LynneD commented May 15, 2023

Hi @aojea, this is the fix for Wiregurad tunnel drop issue.

The wireguard tunnel is broken when any node restart happens. After restart, the restarted node's public key got refreshed but not propagated to other nodes.

The issue is caused by cilium dropping CiliumNode update events when the spec, status and labels between the old and new nodes are the same. Wireguard public key updates are transmitted through annotations.

We want to confirm if this is a known issue since it can be reproduced by restarting any node easily.

Thanks!

@@ -69,7 +69,8 @@ func (k *K8sWatcher) ciliumNodeInit(ciliumNPClient client.Clientset, asyncContro
valid = true
isLocal := k8s.IsLocalCiliumNode(ciliumNode)
if oldCN.DeepEqual(ciliumNode) &&
comparator.MapStringEquals(oldCN.ObjectMeta.Labels, ciliumNode.ObjectMeta.Labels) {
comparator.MapStringEquals(oldCN.ObjectMeta.Labels, ciliumNode.ObjectMeta.Labels) &&
comparator.MapStringEquals(oldCN.ObjectMeta.Annotations, ciliumNode.ObjectMeta.Annotations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes issue in case the wireguard key changes but the rest of the node object not

if pk := n.localNode.WireguardPubKey; pk != "" {
if nodeResource.ObjectMeta.Annotations == nil {
nodeResource.ObjectMeta.Annotations = make(map[string]string)
}
nodeResource.ObjectMeta.Annotations[annotation.WireguardPubKey] = pk
}

since, we don't compare Annotations the event is dropped and not processed

@aojea
Copy link
Contributor

aojea commented May 15, 2023

/assign @aanm @squeed

@LynneD LynneD force-pushed the pr/wireguard-publickey-propagation branch from 0fc487f to d7db202 Compare May 15, 2023 22:56
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 15, 2023
@LynneD LynneD force-pushed the pr/wireguard-publickey-propagation branch from d7db202 to b495ea8 Compare May 15, 2023 23:10
@tommyp1ckles
Copy link
Contributor

Thanks for the PR 😄

I wonder if instead of checking all annotation changes, it might make more sense to target the specific wireguard key annotation and just see if that's changed.

It seems like the intention is to avoid unnecessary node updates.

@aojea
Copy link
Contributor

aojea commented May 15, 2023

I wonder if instead of checking all annotation changes, it might make more sense to target the specific wireguard key annotation and just see if that's changed.

in the long term my experience is that new annotations got added but you never remember to update this comparison, it is also coherent with existing comparison with labels

@LynneD LynneD force-pushed the pr/wireguard-publickey-propagation branch from b495ea8 to 52942f6 Compare May 16, 2023 01:04
@LynneD
Copy link
Author

LynneD commented May 16, 2023

I wonder if instead of checking all annotation changes, it might make more sense to target the specific wireguard key annotation and just see if that's changed.

in the long term my experience is that new annotations got added but you never remember to update this comparison, it is also coherent with existing comparison with labels

Hi @tommyp1ckles, I agree with @aojea's point here. Is there a specific concern about checking all annotation changes?

@christarazi
Copy link
Member

/test

@aanm aanm requested a review from gandro May 16, 2023 07:31
@aanm aanm added area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. area/agent Cilium agent related. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.12 labels May 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 16, 2023
@aanm aanm added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch and removed needs-backport/1.12 labels May 16, 2023
@LynneD LynneD force-pushed the pr/wireguard-publickey-propagation branch from 81ff7f1 to 1d614f3 Compare May 17, 2023 15:27
@LynneD
Copy link
Author

LynneD commented May 17, 2023

Good finds! Could you add #25465 (comment) to the commit message?

Added the comment. Is the the PR ready to merge? We are waiting for the CL to be merged to cherry-pick it to our internal branch.

@christarazi christarazi requested a review from brb May 17, 2023 18:56
@christarazi
Copy link
Member

Good finds! Could you add #25465 (comment) to the commit message?

Added the comment. Is the the PR ready to merge? We are waiting for the CL to be merged to cherry-pick it to our internal branch.

A PR is ready to merge when all codeowners have approved the PR and the CI is all green. Right now, Martynas is required to approve (I've re-requested him) and I will trigger the CI.

@christarazi
Copy link
Member

christarazi commented May 17, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Can not connect to service "tftp://192.168.56.12:31104/hello" from outside cluster (1/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2380/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@LynneD
Copy link
Author

LynneD commented May 17, 2023

Good finds! Could you add #25465 (comment) to the commit message?

Added the comment. Is the the PR ready to merge? We are waiting for the CL to be merged to cherry-pick it to our internal branch.

A PR is ready to merge when all codeowners have approved the PR and the CI is all green. Right now, Martynas is required to approve (I've re-requested him) and I will trigger the CI.

Thanks @christarazi

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @aojea's point here. Is there a specific concern about checking all annotation changes?

Nothing specific - mostly just wondering if it would lead to unnecessary updates. This seems reasonable to me.

@LynneD
Copy link
Author

LynneD commented May 18, 2023

Good finds! Could you add #25465 (comment) to the commit message?

Hi @brb, I've updated the commit message, can you please take another look?

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@LynneD
Copy link
Author

LynneD commented May 18, 2023

/mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next

@LynneD
Copy link
Author

LynneD commented May 18, 2023

Thanks @christarazi @brb @tommyp1ckles @aojea, I've got enough approvals, and the "k8s-1.26-kernel-net-next" test seems not related to this PR. Can you please take another look and see how I can unblock the merging?

@LynneD LynneD force-pushed the pr/wireguard-publickey-propagation branch from 1d614f3 to d38b6b6 Compare May 18, 2023 06:35
Currently we discard CiliumNode updates based on DeepEqual and labels. However DeepEqual is set to ignore Annotations, and the wg-pub-key annotation is used to exchange rotated Wireguard keys.

The wireguard tunnel is broken when any node restart happens. After restart, the restarted node's public key got refreshed but not propagated to other nodes.

The issue is caused by cilium dropping CiliumNode update events when the spec, status and labels between the old and new nodes are the same. Wireguard public key updates are transmitted through annotations.

Signed-off-by: Lin Dong <lindongchn@gmail.com>
@christarazi christarazi force-pushed the pr/wireguard-publickey-propagation branch from d38b6b6 to 975a0d4 Compare May 18, 2023 18:54
@christarazi
Copy link
Member

/test

@christarazi
Copy link
Member

I've rebased to re-run the tests. Please don't push unless the CI points out a failure and you need to push to address it.

@LynneD
Copy link
Author

LynneD commented May 18, 2023

I've rebased to re-run the tests. Please don't push unless the CI points out a failure and you need to push to address it.

Got it. Thanks @christarazi!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 18, 2023
@christarazi christarazi merged commit 76eca78 into cilium:main May 18, 2023
@christarazi
Copy link
Member

Thanks for the PR and your patience!

@LynneD
Copy link
Author

LynneD commented May 18, 2023

Thanks so much @christarazi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.10 This issue affects v1.10 branch affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch affects/v1.13 This issue affects v1.13 branch area/agent Cilium agent related. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

9 participants