Skip to content

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Apr 10, 2020

This PR builds on #10926

The traffic is sent to the stack and hairpin'ed back into a local pod
after a component on the stack has applied a DNAT rule, the traffic must
be SNATed to ensure the reverse NAT can take place. This can happen if
portmap or kiam is being used and redirection happens to a local
destination.

The masquerade filter must be limited as not all DNAT traffic may be
affected. NodePort traffic from a non-local source must remain
unmasqueraded in order for trafficPolicy=local to continue working.

Also, when EnableEndpointRoutes is enabled, traffic always traverses the
stack and must not be masqueraded either.

Fixes: #9784
Requires: #10926

@tgraf tgraf added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Apr 10, 2020
@tgraf tgraf requested a review from a team April 10, 2020 00:45
@coveralls
Copy link

coveralls commented Apr 10, 2020

Coverage Status

Coverage decreased (-0.02%) to 46.812% when pulling 02115cf on pr/tgraf/fix-hairpin-via-stack into 9fbd820 on master.

@tgraf
Copy link
Member Author

tgraf commented Apr 10, 2020

test-me-please

00:19:58.206  There was an error while executing `VBoxManage`, a CLI used by Vagrant
00:19:58.206  for controlling VirtualBox. The command and stderr is shown below.
00:19:58.206  
00:19:58.206  Command: ["list", "hostonlyifs"]
00:19:58.206  
00:19:58.206  Stderr: VBoxManage: error: Failed to create the VirtualBox object!

@tgraf
Copy link
Member Author

tgraf commented Apr 10, 2020

test-me-please

Edit: provisioning failure

19:06:36  Checking for older boxes...
19:06:36  Removing box 'cilium/ubuntu-next' (v50) with provider 'virtualbox'...
19:06:37  VBoxManage: error: Failed to create the VirtualBox object!
19:06:37  VBoxManage: error: Document is empty.
19:06:37  VBoxManage: error: Location: '/root/.config/VirtualBox/VirtualBox.xml', line 1 (0), column 1.
19:06:37  VBoxManage: error: /home/vbox/vbox-6.0.18/src/VBox/Main/src-server/VirtualBoxImpl.cpp[624] (nsresult VirtualBox::init())
19:06:37  VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component VirtualBoxWrap, interface IVirtualBox

@christarazi
Copy link
Member

christarazi commented Apr 10, 2020

test-me-please

00:24:46.741  + cd /home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/runtime-gopath/src/github.com/cilium/cilium/test
00:24:46.741  + vagrant destroy runtime --force
00:24:52.567  There was an error while executing `VBoxManage`, a CLI used by Vagrant
00:24:52.567  for controlling VirtualBox. The command and stderr is shown below.
00:24:52.567  
00:24:52.567  Command: ["list", "hostonlyifs"]
00:24:52.567  
00:24:52.567  Stderr: VBoxManage: error: Failed to create the VirtualBox object!
00:24:52.567  VBoxManage: error: Document is empty.
00:24:52.567  VBoxManage: error: Location: '/root/.config/VirtualBox/VirtualBox.xml', line 1 (0), column 1.
00:24:52.567  VBoxManage: error: /home/vbox/vbox-6.0.18/src/VBox/Main/src-server/VirtualBoxImpl.cpp[624] (nsresult VirtualBox::init())
00:24:52.567  VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component VirtualBoxWrap, interface IVirtualBox

@tgraf
Copy link
Member Author

tgraf commented Apr 10, 2020

test-me-please

Hit #10118

@tgraf
Copy link
Member Author

tgraf commented Apr 11, 2020

test-me-please

Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we haven't seen this in non-chaining scenarios so far, should we gate this to avoid adding the rule for native Cilium setups?

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with minor nits.

@tgraf
Copy link
Member Author

tgraf commented Apr 15, 2020

Given we haven't seen this in non-chaining scenarios so far, should we gate this to avoid adding the rule for native Cilium setups?

This can happen in direct routing mode as well if ENABLE_ROUTING is enabled.

@tgraf tgraf force-pushed the pr/tgraf/fix-hairpin-via-stack branch from 6986f4b to 5483585 Compare April 16, 2020 10:08
@tgraf
Copy link
Member Author

tgraf commented Apr 16, 2020

test-me-please

IP frag test failed:

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/k8s-1.11-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:430
Failed to account for IPv4 fragments (in)
Expected
    <[]int | len:2, cap:2>: [16, 17]
To satisfy at least one of these matchers: [%!s(*matchers.EqualMatcher=&{[16 16]}) %!s(*matchers.EqualMatcher=&{[20 12]})]
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/k8s-1.11-gopath/src/github.com/cilium/cilium/test/k8sT/Services.go:559

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, do we have a small repro we can integrate into the CI? This type of bug seems easy to break by just rearranging rules a bit.

tgraf added 4 commits April 17, 2020 15:41
Packets to a host IP are currently redirected via
cilium_host/cilium_net. The reason for this is mostly historic. For
other packets where routing by the kernel routing tables is desired,
packets are already passed on via TC_ACT_OK to the stack directly.

The two cases where this redirection is needed are:
* For proxy redirection due to a kernel limitation on passing the
  routing tables multiple times. This case is left untouched.
* For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This
  case is left untouched. The IPv4 and IPv6 case is brought in line to
  not accidently lose this logic later on.

A side effect of this is that the skb gets scrubbed including the
skb->mark. The presence of the identity in the skb->mark is being relied
on in a follow-up fix however.

Therfore, pass packets via the stack using TC_ACT_OK. This is faster,
simpler, and allows for the identity to be carried in the mark.

Fixes: #9784

Signed-off-by: Thomas Graf <thomas@cilium.io>
The traffic is sent to the stack and hairpin'ed back into a local pod
after a component on the stack has applied a DNAT rule, the traffic must
be SNATed to ensure the reverse NAT can take place. This can happen if
portmap or kiam is being used and redirection happens to a local
destination.

The masquerade filter must be limited as not all DNAT traffic may be
affected. NodePort traffic from a non-local source must remain
unmasqueraded in order for trafficPolicy=local to continue working.

Also, when EnableEndpointRoutes is enabled, traffic always traverses the
stack and must not be masqueraded either.

Fixes: #9784

Signed-off-by: Thomas Graf <thomas@cilium.io>
Add checks for multi-node and intra-node connectivity via hostport.
These tests will remain unready if hostport/portmap is not enabled.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Requires the following tests to pass:
```
NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-5f555bbc8b-9cxv9                                  1/1     Running   0          41s
echo-b-659766fb56-zw2wl                                  1/1     Running   0          41s
echo-b-host-65d7db76d8-5wmhm                             1/1     Running   0          41s
host-to-b-multi-node-clusterip-c7557d4f8-gv6ws           1/1     Running   0          41s
host-to-b-multi-node-headless-5dfcdf9b76-9hcqn           1/1     Running   0          41s
pod-to-a-6cf58894b7-mqg67                                1/1     Running   0          40s
pod-to-a-allowed-cnp-5898f7d8c9-bdfxz                    1/1     Running   0          41s
pod-to-a-external-1111-5779fb7cb9-tgdlh                  1/1     Running   0          39s
pod-to-a-l3-denied-cnp-74b9566cc7-zjhhh                  1/1     Running   0          41s
pod-to-b-intra-node-77b485d996-xfv45                     1/1     Running   0          40s
pod-to-b-intra-node-hostport-6c55bf8459-vddt2            1/1     Running   0          40s
pod-to-b-multi-node-clusterip-75f5c78f68-2lk8x           1/1     Running   0          40s
pod-to-b-multi-node-headless-5df88f9bd4-f5jlt            1/1     Running   0          40s
pod-to-b-multi-node-hostport-d7c8d659f-4xqpt             1/1     Running   0          39s
pod-to-external-fqdn-allow-google-cnp-74466b4c6f-dxvlq   1/1     Running   0          39s
```

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf force-pushed the pr/tgraf/fix-hairpin-via-stack branch from 5483585 to 02115cf Compare April 17, 2020 13:41
@tgraf tgraf requested a review from a team as a code owner April 17, 2020 13:41
@tgraf tgraf requested a review from a team April 17, 2020 13:41
@tgraf
Copy link
Member Author

tgraf commented Apr 17, 2020

Change looks good, do we have a small repro we can integrate into the CI? This type of bug seems easy to break by just rearranging rules a bit.

Good point. I've extended the connectivity-check YAML to include hostport and added it to the CI. This uncovered one more case that needed fixing which required an additional commit.

@tgraf tgraf requested a review from borkmann April 17, 2020 13:42
@tgraf
Copy link
Member Author

tgraf commented Apr 17, 2020

test-me-please

One test failed with:

00:48:24.124  getting vagrant kubeconfig from provisioned vagrant cluster
00:48:26.959  checking whether kubeconfig works for vagrant cluster
00:48:26.959  Unable to connect to the server: tls: first record does not look like a TLS handshake

GKE and -with-kernel tests all passed

@@ -73,10 +73,62 @@ spec:
- name: echo-container
image: docker.io/cilium/json-mock:1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
hostPort: 40000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added one here for services (ebd7392) but only for the BPF case so far. I wonder if we somewhat could consolidate both, just a thought; not needed for this series in here.

@joestringer
Copy link
Member

@joestringer
Copy link
Member

restart-ginkgo

1 similar comment
@tgraf
Copy link
Member Author

tgraf commented Apr 20, 2020

restart-ginkgo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request time out from container to other container using hostIP:hostPort on same host with portmap CNI chained
8 participants