-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
I deployed Cilium in AWS (EKS) with helm (using the template from the below command).
helm template cilium cilium/cilium --version 1.11.0 \
--namespace kube-system \
--set ipam.mode=cluster-pool \
--set tunnel=vxlan \
--set localRedirectPolicy=true \
--set egressMasqueradeInterfaces=eth+ \
--set nodeinit.enabled=false \
--set hubble.tls.auto.method="cronJob" \
--set hubble.listenAddress=":4244" \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set upgradeCompatibility="1.9" \
--set encryption.enabled=true \
--set encryption.nodeEncryption=false \
--set encryption.type=ipsec \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}" \
--set labels="k8s:io.kubernetes.pod.namespace k8s:k8s-app k8s:app k8s:name k8s:spark-role"
The entire cluster is healthy (we're using a private registry, but these are just copies of the images referenced by helm 1.11.0)
❯ cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Hubble: OK
\__/¯¯\__/ ClusterMesh: disabled
\__/
DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2
Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2
Deployment hubble-relay Desired: 1, Ready: 1/1, Available: 1/1
Deployment hubble-ui Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Running: 2
cilium-operator
hubble-relay Running: 1
hubble-ui Running: 1
Cluster Pods: 44/44 managed by Cilium
Image versions hubble-relay quay.io/ascendio/hubble-relay:v1.11.0: 1
hubble-ui quay.io/ascendio/hubble-ui:v0.8.3: 1
hubble-ui quay.io/ascendio/hubble-ui-backend:v0.8.3: 1
hubble-ui quay.io/ascendio/envoy:v1.18.4: 1
cilium quay.io/ascendio/cilium:v1.11.0: 2
I run kubectl port-forward -n kube-system hubble-ui-5b7f99fcb6-5qqf2 :8080
and navigate to the output address in my browser and see this error
I see some connection retries in hubble-relay (although they seem to stabilize after single retries). Neither hubble backend, frontend, nor proxy is producing any errors.
Cilium Version
❯ cilium version
cilium-cli: v0.10.0 compiled with go1.17.4 on darwin/amd64
cilium image (default): v1.11.0
cilium image (stable): v1.11.1
cilium image (running): v1.11.0
Kernel Version
Linux version 5.4.156-83.273.amzn2.x86_64 (mockbuild@ip-10-0-39-220) (gcc version 7.3.1 20180712 (Red Hat 7.3.1-13) (GCC)) #1 SMP Sat Oct 30 12:59:07 UTC 2021
Kubernetes Version
❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:10:45Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
cilium-sysdump-20220207-230512.zip
Relevant log output
# hubble relay
level=warning msg="Failed to create gRPC client" address="192.168.9.213:4244" error="connection error: desc = \"transport: error while dialing: dial tcp 192.168.9.213:4244: connect: connection refused\"" hubble-tls=true next-try-in=10s peer=ip-192-168-9-213.ec2.internal subsys=hubble-relay
level=info msg=Connecting address="192.168.9.213:4244" hubble-tls=true peer=ip-192-168-9-213.ec2.internal subsys=hubble-relay
level=info msg=Connected address="192.168.9.213:4244" hubble-tls=true peer=ip-192-168-9-213.ec2.internal subsys=hubble-relay
# hubble-ui backend
level=info msg="initialized with TLS disabled\n" subsys=config
level=info msg="listening at: 0.0.0.0:8090\n" subsys=ui-backend
# hubble-ui proxy
❯ kubectl logs -n kube-system hubble-ui-5b7f99fcb6-fx9j7 -c proxy | grep -v info
[2022-01-25 18:51:33.139][1][warning][main] [source/server/server.cc:506] No admin address given, so no admin HTTP server started.
- name: base
static_layer:
{}
- name: admin
admin_layer:
{}
[2022-01-25 18:51:33.151][1][warning][main] [source/server/server.cc:642] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
Anything else?
I'm seeing this same issue in EKS, AKS, and GKE, so I'm assuming that it's either a bug or an issue of my config (we're using very similar config in all 3 clouds)
Code of Conduct
- I agree to follow this project's Code of Conduct