-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Network policies for the host endpoint #11507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
Coverage increased (+0.03%) to 37.035% when pulling 56f9713661a593579170d81addb5f232c731b878 on pr/pchaigno/host-policies into 13bcf96 on master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a first pass with mostly high-level feedback discussion points below, although I couldn't pick many nits anyway as it overall looks really clean!
I didn't look closely at the last 3 patches (ipv6 and the complexity/lb patches). Several of my points below are appropriate for separate followup later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small nits, overall LGTM. I only reviewed the control plane sections
@@ -70,6 +83,63 @@ func NewRule() *Rule { | |||
return &Rule{} | |||
} | |||
|
|||
// MarshalJSON returns the JSON encoding of Rule r. We need to overwrite it to | |||
// enforce omitempty on the EndpointSelector nested structures. | |||
func (r *Rule) MarshalJSON() ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note to rework this in follow ups because it's less trivial than we thought. Using reflect
, I might have a solution that doesn't require as many code changes and is easier on maintainability, but it still needs some work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that this PR introduced custom JSON marshaling for Rule
. This has some implications for #11607. Specifically, one of the reasons I forked controller-tools
is to remove a constraint which says: any type implementing custom JSON marshaling will have its validation schema replaced with type: Any
. In the upstream, kubernetes-sigs/controller-tools#427 is responsible for this. In the fork, I've reverted that support.
My question is why do we need this? I asked on K8s Slack about opting out of this feature in controller-tools. It's possible that we can keep it with a knob in controller-tools, or we solve the problem another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to properly implement the omitempty
tags of NodeSelector
and EndpointSelector
. Because they are not pointers, without the custom marshal, marshalling then unmarshalling would create a new field where none existed. That is, when marshalling it would check r.EndpointSelector
(always != nil
) instead of r.EndpointSelector.LabelSelector
and thus always create the corresponding JSON entry.
You can easily reproduce this by removing the custom marshalling and running unit tests on the package. One of the tests checks json.Unmarshal(json.Marhsal(rule)) == rule
.
bf8ca9f
to
76c8466
Compare
cdcf227
to
5a9fd36
Compare
76c8466
to
b489494
Compare
5a9fd36
to
980fee7
Compare
11821ee
to
f8e8b7b
Compare
980fee7
to
ee404f4
Compare
f8e8b7b
to
1cc2f9c
Compare
455e5fe
to
1e0cbec
Compare
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the its host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the its host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the its host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d20d905 ] When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 576028d ] When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit d20d905 ] When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 576028d ] When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d20d905 ] When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 576028d ] When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>
This pull request adds network policies (CIDR, labels, and ports) for the host through the new host endpoint. It also introduces a new
nodeSelector
in our internal JSON policy.As a summary:
--enable-host-firewall
option.localID
as argument.relax_verifier()
to improve state pruning.The last piece, to watch and update labels for the node, will come in a separate pull request. It's also dependent on the host endpoint PR, but is independent from the present PR.
Fixes #9915