Skip to content

Conversation

ldelossa
Copy link
Contributor

@ldelossa ldelossa commented Feb 10, 2025

ci,wireguard: introduce pod to pod v2 connectivity tests for Wireguard

This commit introduces a new connectivity test for Wireguard pod-to-pod
traffic.
The test is ran on post v1.18 clusters only.

These connectivity tests act as leak detection tests.
encrypted overlay traffic performs encryption just before the packet is going to
leave the host.
Therefore these tests ensure we do not see client->server traffic (or
its return traffic), unencrypted at the ultimate egress device before the
traffic leaves the host toward the destination.

The test works as follows:

  • Find a client pod and a server pod, each on separate nodes.
  • Find the client side's egress device toward the server.
  • Find the server side's egress device toward the client (return
    traffic).
  • Create the appropriate TCPDump filter for client->server traffic
  • Create the appropriate TCPDump filter for server->client return
    traffic
  • Start TCPDump sniffers on both nodes using the resolved TCPDump
    filters and egress devices.
  • Ensure that we never see plain-text client->server traffic on the
    client side and we never see plain-text server->client return traffic
    on the server side.

To expand on the above

  • Resolving the correct TCPDump filters involves understanding what
    routing mode we are. When tunnel mode is use the test will create
    TCPDump filters which seek into the tunnel packets to find plain-text
    inner IP headers.
  • We filter for client->server traffic on the client node is encrypted and
    server->client return traffic on the server node is as well. We do not
    check if server->client return traffic on the CLIENT node is
    encrypted. This is because XFRM will recirculate decrypted packets on
    the interface they arrived on. If we filter for the return traffic
    we'd get false positives since TCPDump will confuse the decrypted
    traffic for a leak. This is why we test the return traffic is
    encrypted, server side. While the initial introduction to these tests
    target wireguard, we keep this invariant to easily support IPsec as
    well.
  • These tests also run when encryption is disabled on the cluster,
    switching the TCPDump instance to report an error if NO packets are
    detected. This is a sanity check ensuring the TCPDump filters would
    in-fact catch leaked traffic when encryption is enabled.

There is more subtlety to this test that are drawn attention to by the
comments and it is useful to read through these.

This test will also be used for IPsec pod-to-pod traffic once VinE is
properly implemented, unifying the testing approach for both Wireguard
and IPsec.

wireguard: introduce v2 pod-to-pod connectivity tests 

@ldelossa ldelossa requested review from a team as code owners February 10, 2025 17:33
@ldelossa ldelossa changed the title ci,ipsec: introduce VinE connectivity tests for IPsec ci,ipsec: introduce VinE connectivity tests for Wireguard Feb 10, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 10, 2025
@github-actions github-actions bot added cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary labels Feb 10, 2025
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from a4fd55e to dffa5fd Compare February 10, 2025 17:33
@ldelossa ldelossa added the release-note/misc This PR makes changes that have no direct user impact. label Feb 10, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 10, 2025
@ldelossa ldelossa changed the title ci,ipsec: introduce VinE connectivity tests for Wireguard ci,wireguard: introduce VinE connectivity tests for Wireguard Feb 10, 2025
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from dffa5fd to ff674fe Compare February 10, 2025 17:34
@ldelossa ldelossa changed the title ci,wireguard: introduce VinE connectivity tests for Wireguard ci,wireguard: introduce pod to pod v2 connectivity tests for Wireguard Feb 10, 2025
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from ff674fe to 7aeea30 Compare February 10, 2025 17:40
@jschwinger233 jschwinger233 added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. feature/wireguard Relates to Cilium's Wireguard feature labels Feb 11, 2025
@jschwinger233
Copy link
Member

/ci-e2e-upgrade

@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from 7aeea30 to 06d9908 Compare February 11, 2025 14:12
@ldelossa ldelossa requested a review from a team as a code owner February 11, 2025 14:12
@ldelossa ldelossa requested a review from aditighag February 11, 2025 14:12
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from 06d9908 to 8709fc2 Compare February 11, 2025 14:45
@jschwinger233
Copy link
Member

/test

@ldelossa
Copy link
Contributor Author

/ci-e2e-upgrade

@ldelossa
Copy link
Contributor Author

/test

@ldelossa
Copy link
Contributor Author

ldelossa commented Feb 11, 2025

Cluster mesh conformance tests seem to be failing legitimately.

  🟥 Failed to resolve egress device for server: Failed to resolve egress device for client: command failed (pod=cilium-test-3/host-netns-k7fqt, container=): pods "host-netns-k7fqt" not found

A bit odd, we resolve the hostNS container for the client like this:

	if clientHostNS, ok := s.ct.HostNetNSPodsByNode()[s.client.Pod.Spec.NodeName]; !ok {
		t.Fatalf("Fail to acquire host namespace pod on %s\n (client's node)", s.client.Pod.Spec.NodeName)
	} else {
		s.clientHostNS = &clientHostNS
	}

In sysdumps I see the pod:

cilium-sysdump-1-20250211-161414/k8s-pods-20250211-161414.yaml
3164:    name: host-netns-k7fqt

cilium-sysdump-1-20250211-161414/k8s-pods-20250211-161414.txt
13:cilium-test-3        host-netns-k7fqt                                             1/1     Running   0          4m8s    172.18.0.7     cluster2-13267146001-worker          <none>           <none>

Is cilium-cli getting confused about what cluster to access the pod on?

🤔

edit:

@giorio94 was quick in finding this issue: https://github.com/cilium/cilium/blob/ldelossa%2Fencrypted-overlay-connectivity-tests/cilium-cli/connectivity/tests/encryption_v2.go#L128
Always utilizing the same client, will fix.

@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from 8709fc2 to 6c12fd2 Compare February 11, 2025 18:29
@ldelossa
Copy link
Contributor Author

/test

@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from 6c12fd2 to 657bfba Compare February 11, 2025 21:18
@ldelossa
Copy link
Contributor Author

/test

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my codeowners

Copy link
Contributor

@smagnani96 smagnani96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Louis, great work, thanks. As also discussed, the tests looks conceptually good for me, and the TCPDUMP filters seems to work smoothly also from local testing.
I just left a couple of nits and minor comments, let me know if anything's not clear 😃

@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Feb 12, 2025
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch 2 times, most recently from b20e6c5 to a5c0915 Compare February 12, 2025 18:35
This commit introduces a new connectivity test for Wireguard pod-to-pod
traffic.
The test is ran on post v1.18 clusters only.

These connectivity tests act as leak detection tests.
encrypted overlay traffic performs encryption just before the packet is going to
leave the host.
Therefore these tests ensure we do not see client->server traffic (or
its return traffic), unencrypted at the ultimate egress device before the
traffic leaves the host toward the destination.

The test works as follows:
- Find a client pod and a server pod, each on separate nodes.
- Find the client side's egress device toward the server.
- Find the server side's egress device toward the client (return
  traffic).
- Create the appropriate TCPDump filter for client->server traffic
- Create the appropriate TCPDump filter for server->client return
  traffic
- Start TCPDump sniffers on both nodes using the resolved TCPDump
  filters and egress devices.
- Ensure that we never see plain-text client->server traffic on the
  client side and we never see plain-text server->client return traffic
  on the server side.

To expand on the above
- Resolving the correct TCPDump filters involves understanding what
  routing mode we are. When tunnel mode is use the test will create
  TCPDump filters which seek into the tunnel packets to find plain-text
  inner IP headers.
- We filter for client->server traffic on the client node is encrypted and
  server->client return traffic on the server node is as well. We do not
  check if server->client return traffic on the CLIENT node is
  encrypted. This is because XFRM will recirculate decrypted packets on
  the interface they arrived on. If we filter for the return traffic
  we'd get false positives since TCPDump will confuse the decrypted
  traffic for a leak. This is why we test the return traffic is
  encrypted, server side. While the initial introduction to these tests
  target wireguard, we keep this invariant to easily support IPsec as
  well.
- These tests also run when encryption is disabled on the cluster,
  switching the TCPDump instance to report an error if NO packets are
  detected. This is a sanity check ensuring the TCPDump filters would
  in-fact catch leaked traffic when encryption is enabled.

There is more subtlety to this test that are drawn attention to by the
comments and it is useful to read through these.

This test will also be used for IPsec pod-to-pod traffic once VinE is
properly implemented, unifying the testing approach for both Wireguard
and IPsec.

Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
@ldelossa ldelossa force-pushed the ldelossa/encrypted-overlay-connectivity-tests branch from a5c0915 to 26bd50d Compare February 12, 2025 19:07
@ldelossa
Copy link
Contributor Author

/test

Copy link
Contributor

@smagnani96 smagnani96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments 💯

@ldelossa ldelossa added this pull request to the merge queue Feb 13, 2025
Merged via the queue into main with commit d3b7bc6 Feb 13, 2025
210 of 211 checks passed
@ldelossa ldelossa deleted the ldelossa/encrypted-overlay-connectivity-tests branch February 13, 2025 01:11
jschwinger233 added a commit that referenced this pull request Feb 14, 2025
jschwinger233 added a commit that referenced this pull request Feb 14, 2025
jschwinger233 added a commit that referenced this pull request Feb 14, 2025
jschwinger233 added a commit that referenced this pull request Feb 14, 2025
jschwinger233 added a commit that referenced this pull request Feb 18, 2025
…aining

Pod-to-pod-encryption-v2 was introduced by #37533.
One the major changes of v2 is enabling sanity checks even for
non-encryption cluster. This caused troubles because v2 tests didn't take
cni-chaining mode, which is not compatible with encryption, into consideration.

This patch disabled v2 test on aws-cni chaining to avoid ci breakage.

Fixed: #37682

Signed-off-by: gray <gray.liang@isovalent.com>
jschwinger233 added a commit that referenced this pull request Feb 18, 2025
…aining

Pod-to-pod-encryption-v2 was introduced by #37533.
One of the major v2 changes is enabling sanity checks even for
non-encryption cluster. This caused troubles because v2 tests didn't take
cni-chaining mode, which is not compatible with encryption, into consideration.

This patch disabled v2 test on aws-cni chaining to avoid ci breakage.

Fixed: #37682

Signed-off-by: gray <gray.liang@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary feature/wireguard Relates to Cilium's Wireguard feature ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants