Skip to content

cilium connectivity test fails when nodelocaldns is running in cluster #20055

@eminaktas

Description

@eminaktas

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Hi,

In our environment, pod-to-a-allowed-cnp and pod-to-external-fqdn-allow-google-cnp tests are failing when cilium runs with nodelocaldns. We used these resources to find out what pods are failing. Also, some tests are failing for some reason in cilium connectivity test command.

So, we have noticed that packages are being dropped because of the policies. Here is one example output from Hubble UI

Flow Details
Timestamp
2022-06-02T12:06:10.175Z
Verdict
dropped
Drop reason
Policy denied
Traffic direction
egress
Source pod
pod-to-a-allowed-cnp-bbc844c6f-2zrdf
Source identity
45495
Source labels
io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=test-alani
io.cilium.k8s.policy.cluster=default
io.cilium.k8s.policy.serviceaccount=default
namespace=test-alani
name=pod-to-a-allowed-cnp
Source IP
some-ip
Destination identity
2
Destination labels
reserved:world
Destination IP
169.254.25.10
Destination port
53

nodelocadns settings:

  Containers:
   node-cache:
    Image:       k8s.gcr.io/dns/k8s-dns-node-cache:1.21.1
    Ports:       53/UDP, 53/TCP, 9253/TCP
    Host Ports:  53/UDP, 53/TCP, 9253/TCP
    Args:
      -localip
      169.254.25.10
      -conf
      /etc/coredns/Corefile
      -upstreamsvc
      coredns
    Limits:
      memory:  170Mi
Corefile:
----
poc-cilium-test:53 {
    errors
    cache {
        success 9984 30
        denial 9984 5
    }
    reload
    loop
    bind 169.254.25.10
    forward . coredns-ip {
        force_tcp
    }
    prometheus :9253
    health 169.254.25.10:9254
}
in-addr.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.25.10
    forward . coredns-ip {
        force_tcp
    }
    prometheus :9253
}
ip6.arpa:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.25.10
    forward . coredns-ip {
        force_tcp
    }
    prometheus :9253
}
.:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.25.10
    forward . /etc/resolv.conf
    prometheus :9253
}

Any idea what could be happening here or are we hitting a bug?

cc @necatican

Cilium Version

cilium version
cilium-cli: v0.9.3 compiled with go1.17.3 on darwin/amd64
cilium image (default): v1.10.5
cilium image (stable): v1.11.5
cilium image (running): v1.11.2

Kernel Version

uname -a
Linux poc-cilium-test-1 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

No response

Relevant log output

No response

Anything else?

cilium connectivity test outpu

cilium connectivity test 
ℹ️  Monitor aggregation detected, will skip some flow validation steps
✨ [poc-cilium-test] Creating namespace for connectivity check...
✨ [poc-cilium-test] Deploying echo-same-node service...
✨ [poc-cilium-test] Deploying same-node deployment...
✨ [poc-cilium-test] Deploying client deployment...
✨ [poc-cilium-test] Deploying client2 deployment...
✨ [poc-cilium-test] Deploying echo-other-node service...
✨ [poc-cilium-test] Deploying other-node deployment...
⌛ [poc-cilium-test] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [poc-cilium-test] Waiting for deployments [echo-other-node] to become ready...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/client-7568bc7f86-9hjkp to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/client2-686d5f784b-xvplx to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-59d779959c-gj5rp to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-5767b7b99d-qchrk to appear...
⌛ [poc-cilium-test] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [poc-cilium-test] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
ℹ️  Skipping IPCache check
⌛ [poc-cilium-test] Waiting for pod cilium-test/client-7568bc7f86-9hjkp to reach default/kubernetes service...
⌛ [poc-cilium-test] Waiting for pod cilium-test/client2-686d5f784b-xvplx to reach default/kubernetes service...
🔭 Enabling Hubble telescope...
⚠️  Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️  Expose Relay locally with:
   cilium hubble enable
   cilium status --wait
   cilium hubble port-forward&
🏃 Running tests...

[=] Test [no-policies]
........................................
[=] Test [allow-all]
....................................
[=] Test [client-ingress]
..
[=] Test [echo-ingress]
....
[=] Test [client-egress]
....
[=] Test [to-entities-world]
.
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-to-entities-world' to namespace 'cilium-test'..
  [-] Scenario [to-entities-world/pod-to-world]
  [.] Action [to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
  ℹ️  curl output:
  curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
  
  📄 No flows recorded during action http-to-one-one-one-one-0
  📄 No flows recorded during action http-to-one-one-one-one-0
  [.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https-index (one.one.one.one:443)]
  [.] Action [to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
  ℹ️  curl output:
  curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
  
  📄 No flows recorded during action http-to-one-one-one-one-1
  📄 No flows recorded during action http-to-one-one-one-one-1
  [.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
  ℹ️  📜 Deleting CiliumNetworkPolicy 'client-egress-to-entities-world' from namespace 'cilium-test'..

[=] Test [to-cidr-1111]
....
[=] Test [echo-ingress-l7]
....
[=] Test [client-egress-l7]
........
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-only-dns' to namespace 'cilium-test'..
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-l7-http' to namespace 'cilium-test'..
  [-] Scenario [client-egress-l7/pod-to-pod]
  [.] Action [client-egress-l7/pod-to-pod/curl-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> cilium-test/echo-other-node-59d779959c-gj5rp (10.233.64.161:8080)]
  [.] Action [client-egress-l7/pod-to-pod/curl-1: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> cilium-test/echo-same-node-5767b7b99d-qchrk (10.233.65.57:8080)]
  [.] Action [client-egress-l7/pod-to-pod/curl-2: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> cilium-test/echo-other-node-59d779959c-gj5rp (10.233.64.161:8080)]
  [.] Action [client-egress-l7/pod-to-pod/curl-3: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> cilium-test/echo-same-node-5767b7b99d-qchrk (10.233.65.57:8080)]
  [-] Scenario [client-egress-l7/pod-to-world]
  [.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-http (one.one.one.one:80)]
  [.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https-index (one.one.one.one:443)]
  [.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
  ℹ️  curl output:
  curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
  
  📄 No flows recorded during action http-to-one-one-one-one-1
  📄 No flows recorded during action http-to-one-one-one-one-1
  [.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
  ℹ️  📜 Deleting CiliumNetworkPolicy 'client-egress-only-dns' from namespace 'cilium-test'..
  ℹ️  📜 Deleting CiliumNetworkPolicy 'client-egress-l7-http' from namespace 'cilium-test'..

[=] Test [dns-only]
..........
[=] Test [to-fqdns]
.
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' to namespace 'cilium-test'..
  [-] Scenario [to-fqdns/pod-to-world]
  [.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
  ℹ️  curl output:
  curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
  
  📄 No flows recorded during action http-to-one-one-one-one-0
  📄 No flows recorded during action http-to-one-one-one-one-0
  [.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
  ℹ️  curl output:
  curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
  
  📄 No flows recorded during action http-to-one-one-one-one-1
  📄 No flows recorded during action http-to-one-one-one-one-1
  [.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-https (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-https-index (one.one.one.one:443)]
  ℹ️  📜 Deleting CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' from namespace 'cilium-test'..

📋 Test Report
❌ 3/11 tests failed (5/126 actions), 0 tests skipped, 0 scenarios skipped:
Test [to-entities-world]:
  ❌ to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)
  ❌ to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7]:
  ❌ client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Test [to-fqdns]:
  ❌ to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
  ❌ to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Connectivity test failed: 3 tests failed

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.needs/triageThis issue requires triaging to establish severity and next steps.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions