Skip to content

CiliumCIDRGroup ref overwrites entity-based rules in CiliumClusterwideNetworkPolicy #36393

@abcdegorov

Description

@abcdegorov

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.4 and lower than v1.17.0

What happened?

Hi! I have stumbled upon an issue where, if you have both fromEntities and fromCIDRSet specified in a single CiliumClusterwideNetworkPolicy (at least in the ingress spec), the latter overwrites the former, and entities are not included in the final aggregated policy, which in turn causes issues with various non-critical Kubernetes tools.
Example CIDR group and policy (IPs redacted):

---
apiVersion: cilium.io/v2alpha1
kind: CiliumCIDRGroup
metadata:
  name: test-ccg
spec:
  externalCIDRs:
    - "1.2.3.4/32"
    - "5.6.7.8/32"
---
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: talos
spec:
  description: Allow access to Talos Linux apid and trustd
  nodeSelector:
    matchLabels: {}
  ingress:
    - fromEntities:
        - cluster
      toPorts:
        - ports:
            - port: "50000"
              protocol: TCP
            - port: "50001"
              protocol: TCP
    - fromCIDRSet:
        - cidrGroupRef: test-ccg
      toPorts:
        - ports:
            - port: "50000"
              protocol: TCP
            - port: "50001"
              protocol: TCP

After applying, our Talos backup CronJob fails with dial tcp 10.99.85.21:50000: i/o timeout, which is the address of the talos service in the default namespace.
Output of cilium monitor:

Policy verdict log: flow 0x0 local EP ID 1524, remote ID 82933, proto 6, ingress, action deny, auth: disabled, match none, 10.244.5.44:52330 -> 10.10.13.201:50000 tcp SYN
Policy verdict log: flow 0x0 local EP ID 1524, remote ID 82933, proto 6, ingress, action deny, auth: disabled, match none, 10.244.5.44:52330 -> 10.10.13.201:50000 tcp SYN
Policy verdict log: flow 0x0 local EP ID 1524, remote ID 82933, proto 6, ingress, action deny, auth: disabled, match none, 10.244.5.44:52330 -> 10.10.13.201:50000 tcp SYN
Policy verdict log: flow 0x0 local EP ID 1524, remote ID 82933, proto 6, ingress, action deny, auth: disabled, match none, 10.244.5.44:52330 -> 10.10.13.201:50000 tcp SYN

A similar policy is applied for port 6443, and that results in many components, including metrics-server, various operators, Kyverno, and even hubble-generate-certs failing to connect to kube-apiserver.

Observing the output of cilium-dbg endpoint get <HOST_EP_ID> -o yaml (default output results in #29247), I can see that the aggregated policy does not include reserved entities from the cluster entity group - here's the status.policy.realized.l4 block:

                    - derivedfromrules:
                        - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                          - k8s:io.cilium.k8s.policy.name=talos
                          - k8s:io.cilium.k8s.policy.uid=09fd8860-e7f2-48b8-94c1-84ebde4d9fb0
                      rule: '{"port":50000,"protocol":"TCP","l7-rules":[{"\u0026LabelSelector{MatchLabels:map[string]string{cidr.1.2.3.4/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}":null},{"\u0026LabelSelector{MatchLabels:map[string]string{cidr.5.6.7.8/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}":null},{"\u0026LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}":null}]}'
                      rulesbyselector:
                        '&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=talos
                              - k8s:io.cilium.k8s.policy.uid=09fd8860-e7f2-48b8-94c1-84ebde4d9fb0
                        '&LabelSelector{MatchLabels:map[string]string{cidr.1.2.3.4/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=talos
                              - k8s:io.cilium.k8s.policy.uid=09fd8860-e7f2-48b8-94c1-84ebde4d9fb0
                        '&LabelSelector{MatchLabels:map[string]string{cidr.5.6.7.8/32: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=talos
                              - k8s:io.cilium.k8s.policy.uid=09fd8860-e7f2-48b8-94c1-84ebde4d9fb0

If I remove fromCIDRSet from the policy and replace with with a more traditional fromCIDR listing all the necessary IPs directly in the policy, everything works correctly - connections are let through and the entities appear in the aggregated policy:

                        ? '&LabelSelector{MatchLabels:map[string]string{k8s.io.cilium.k8s.policy.cluster: test,},MatchExpressions:[]LabelSelectorRequirement{},}'
                        :   - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.health: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.ingress: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.init: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.kube-apiserver: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.remote-node: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871
                        '&LabelSelector{MatchLabels:map[string]string{reserved.unmanaged: ,},MatchExpressions:[]LabelSelectorRequirement{},}':
                            - - k8s:io.cilium.k8s.policy.derived-from=CiliumClusterwideNetworkPolicy
                              - k8s:io.cilium.k8s.policy.name=controlplane
                              - k8s:io.cilium.k8s.policy.uid=48b548eb-2a12-45a7-95fa-6cfdd452c871

How can we reproduce the issue?

  1. Have Cilium installed with Helm with hostFirewall.enabled=true and policyAuditMode=false values
  2. Apply a CiliumCIDRGroup and a CiliumClusterwideNetworkPolicy which has both fromEntities and fromCIDRSet in its ingress spec
  3. Observe timeouts within various components and deny verdicts in cilium monitor -t policy-verdict

Cilium Version

v1.16.4

Kernel Version

Linux 6.6.33-talos x86_64

Kubernetes Version

v1.28.7

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.kind/regressionThis functionality worked fine before, but was broken in a newer release of Cilium.needs/triageThis issue requires triaging to establish severity and next steps.sig/policyImpacts whether traffic is allowed or denied based on user-defined policies.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions