-
Notifications
You must be signed in to change notification settings - Fork 3.4k
datapath/iptables: Masquerade hairpin traffic that traversed the stack #10928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test-me-please
|
test-me-please Edit: provisioning failure
|
test-me-please
|
test-me-please Hit #10118 |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we haven't seen this in non-chaining scenarios so far, should we gate this to avoid adding the rule for native Cilium setups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, with minor nits.
This can happen in direct routing mode as well if |
6986f4b
to
5483585
Compare
test-me-please IP frag test failed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good, do we have a small repro we can integrate into the CI? This type of bug seems easy to break by just rearranging rules a bit.
Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
Add checks for multi-node and intra-node connectivity via hostport. These tests will remain unready if hostport/portmap is not enabled. Signed-off-by: Thomas Graf <thomas@cilium.io>
Requires the following tests to pass: ``` NAME READY STATUS RESTARTS AGE echo-a-5f555bbc8b-9cxv9 1/1 Running 0 41s echo-b-659766fb56-zw2wl 1/1 Running 0 41s echo-b-host-65d7db76d8-5wmhm 1/1 Running 0 41s host-to-b-multi-node-clusterip-c7557d4f8-gv6ws 1/1 Running 0 41s host-to-b-multi-node-headless-5dfcdf9b76-9hcqn 1/1 Running 0 41s pod-to-a-6cf58894b7-mqg67 1/1 Running 0 40s pod-to-a-allowed-cnp-5898f7d8c9-bdfxz 1/1 Running 0 41s pod-to-a-external-1111-5779fb7cb9-tgdlh 1/1 Running 0 39s pod-to-a-l3-denied-cnp-74b9566cc7-zjhhh 1/1 Running 0 41s pod-to-b-intra-node-77b485d996-xfv45 1/1 Running 0 40s pod-to-b-intra-node-hostport-6c55bf8459-vddt2 1/1 Running 0 40s pod-to-b-multi-node-clusterip-75f5c78f68-2lk8x 1/1 Running 0 40s pod-to-b-multi-node-headless-5df88f9bd4-f5jlt 1/1 Running 0 40s pod-to-b-multi-node-hostport-d7c8d659f-4xqpt 1/1 Running 0 39s pod-to-external-fqdn-allow-google-cnp-74466b4c6f-dxvlq 1/1 Running 0 39s ``` Signed-off-by: Thomas Graf <thomas@cilium.io>
5483585
to
02115cf
Compare
Good point. I've extended the connectivity-check YAML to include hostport and added it to the CI. This uncovered one more case that needed fixing which required an additional commit. |
test-me-please One test failed with:
GKE and -with-kernel tests all passed |
@@ -73,10 +73,62 @@ spec: | |||
- name: echo-container | |||
image: docker.io/cilium/json-mock:1.0 | |||
imagePullPolicy: IfNotPresent | |||
ports: | |||
- containerPort: 80 | |||
hostPort: 40000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added one here for services (ebd7392) but only for the BPF case so far. I wonder if we somewhat could consolidate both, just a thought; not needed for this series in here.
Tests were interrupted, retrying: |
restart-ginkgo |
1 similar comment
restart-ginkgo |
This PR builds on #10926
The traffic is sent to the stack and hairpin'ed back into a local pod
after a component on the stack has applied a DNAT rule, the traffic must
be SNATed to ensure the reverse NAT can take place. This can happen if
portmap or kiam is being used and redirection happens to a local
destination.
The masquerade filter must be limited as not all DNAT traffic may be
affected. NodePort traffic from a non-local source must remain
unmasqueraded in order for trafficPolicy=local to continue working.
Also, when EnableEndpointRoutes is enabled, traffic always traverses the
stack and must not be masqueraded either.
Fixes: #9784
Requires: #10926