-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Hi,
In our environment, pod-to-a-allowed-cnp
and pod-to-external-fqdn-allow-google-cnp
tests are failing when cilium runs with nodelocaldns. We used these resources to find out what pods are failing. Also, some tests are failing for some reason in cilium connectivity test
command.
So, we have noticed that packages are being dropped because of the policies. Here is one example output from Hubble UI
Flow Details
Timestamp
2022-06-02T12:06:10.175Z
Verdict
dropped
Drop reason
Policy denied
Traffic direction
egress
Source pod
pod-to-a-allowed-cnp-bbc844c6f-2zrdf
Source identity
45495
Source labels
io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=test-alani
io.cilium.k8s.policy.cluster=default
io.cilium.k8s.policy.serviceaccount=default
namespace=test-alani
name=pod-to-a-allowed-cnp
Source IP
some-ip
Destination identity
2
Destination labels
reserved:world
Destination IP
169.254.25.10
Destination port
53
nodelocadns settings:
Containers:
node-cache:
Image: k8s.gcr.io/dns/k8s-dns-node-cache:1.21.1
Ports: 53/UDP, 53/TCP, 9253/TCP
Host Ports: 53/UDP, 53/TCP, 9253/TCP
Args:
-localip
169.254.25.10
-conf
/etc/coredns/Corefile
-upstreamsvc
coredns
Limits:
memory: 170Mi
Corefile:
----
poc-cilium-test:53 {
errors
cache {
success 9984 30
denial 9984 5
}
reload
loop
bind 169.254.25.10
forward . coredns-ip {
force_tcp
}
prometheus :9253
health 169.254.25.10:9254
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.25.10
forward . coredns-ip {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.25.10
forward . coredns-ip {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.25.10
forward . /etc/resolv.conf
prometheus :9253
}
Any idea what could be happening here or are we hitting a bug?
cc @necatican
Cilium Version
cilium version
cilium-cli: v0.9.3 compiled with go1.17.3 on darwin/amd64
cilium image (default): v1.10.5
cilium image (stable): v1.11.5
cilium image (running): v1.11.2
Kernel Version
uname -a
Linux poc-cilium-test-1 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
No response
Relevant log output
No response
Anything else?
cilium connectivity test outpu
cilium connectivity test
ℹ️ Monitor aggregation detected, will skip some flow validation steps
✨ [poc-cilium-test] Creating namespace for connectivity check...
✨ [poc-cilium-test] Deploying echo-same-node service...
✨ [poc-cilium-test] Deploying same-node deployment...
✨ [poc-cilium-test] Deploying client deployment...
✨ [poc-cilium-test] Deploying client2 deployment...
✨ [poc-cilium-test] Deploying echo-other-node service...
✨ [poc-cilium-test] Deploying other-node deployment...
⌛ [poc-cilium-test] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [poc-cilium-test] Waiting for deployments [echo-other-node] to become ready...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/client-7568bc7f86-9hjkp to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/client2-686d5f784b-xvplx to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-59d779959c-gj5rp to appear...
⌛ [poc-cilium-test] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-5767b7b99d-qchrk to appear...
⌛ [poc-cilium-test] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [poc-cilium-test] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:30389 (cilium-test/echo-other-node) to become ready...
⌛ [poc-cilium-test] Waiting for NodePort some-ip:32452 (cilium-test/echo-same-node) to become ready...
ℹ️ Skipping IPCache check
⌛ [poc-cilium-test] Waiting for pod cilium-test/client-7568bc7f86-9hjkp to reach default/kubernetes service...
⌛ [poc-cilium-test] Waiting for pod cilium-test/client2-686d5f784b-xvplx to reach default/kubernetes service...
🔭 Enabling Hubble telescope...
⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️ Expose Relay locally with:
cilium hubble enable
cilium status --wait
cilium hubble port-forward&
🏃 Running tests...
[=] Test [no-policies]
........................................
[=] Test [allow-all]
....................................
[=] Test [client-ingress]
..
[=] Test [echo-ingress]
....
[=] Test [client-egress]
....
[=] Test [to-entities-world]
.
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-entities-world' to namespace 'cilium-test'..
[-] Scenario [to-entities-world/pod-to-world]
[.] Action [to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-0
📄 No flows recorded during action http-to-one-one-one-one-0
[.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https-index (one.one.one.one:443)]
[.] Action [to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-1
📄 No flows recorded during action http-to-one-one-one-one-1
[.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-entities-world/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-to-entities-world' from namespace 'cilium-test'..
[=] Test [to-cidr-1111]
....
[=] Test [echo-ingress-l7]
....
[=] Test [client-egress-l7]
........
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-only-dns' to namespace 'cilium-test'..
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-l7-http' to namespace 'cilium-test'..
[-] Scenario [client-egress-l7/pod-to-pod]
[.] Action [client-egress-l7/pod-to-pod/curl-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> cilium-test/echo-other-node-59d779959c-gj5rp (10.233.64.161:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-1: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> cilium-test/echo-same-node-5767b7b99d-qchrk (10.233.65.57:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-2: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> cilium-test/echo-other-node-59d779959c-gj5rp (10.233.64.161:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-3: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> cilium-test/echo-same-node-5767b7b99d-qchrk (10.233.65.57:8080)]
[-] Scenario [client-egress-l7/pod-to-world]
[.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-http (one.one.one.one:80)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-7568bc7f86-9hjkp (10.233.65.69) -> one-one-one-one-https-index (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-1
📄 No flows recorded during action http-to-one-one-one-one-1
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-only-dns' from namespace 'cilium-test'..
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-l7-http' from namespace 'cilium-test'..
[=] Test [dns-only]
..........
[=] Test [to-fqdns]
.
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' to namespace 'cilium-test'..
[-] Scenario [to-fqdns/pod-to-world]
[.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-0
📄 No flows recorded during action http-to-one-one-one-one-0
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client2-686d5f784b-xvplx (10.233.65.123) -> one-one-one-one-https-index (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-1
📄 No flows recorded during action http-to-one-one-one-one-1
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-https-index (one.one.one.one:443)]
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' from namespace 'cilium-test'..
📋 Test Report
❌ 3/11 tests failed (5/126 actions), 0 tests skipped, 0 scenarios skipped:
Test [to-entities-world]:
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)
❌ to-entities-world/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Test [client-egress-l7]:
❌ client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Test [to-fqdns]:
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-xvplx (some-ip) -> one-one-one-one-http (one.one.one.one:80)
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-9hjkp (some-ip) -> one-one-one-one-http (one.one.one.one:80)
Connectivity test failed: 3 tests failed
Code of Conduct
- I agree to follow this project's Code of Conduct