-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed
Labels
affects/mainThis issue affects main branchThis issue affects main branchaffects/v1.18This issue affects v1.18 branchThis issue affects v1.18 brancharea/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.Impacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.18.0 and lower than v1.19.0
What happened?
We've enabled topology aware routing on one of our clusters and annotated our services with trafficDistribution: PreferClose, which, according to the k8s docs, should prefer endpoints in the same zone and fallback to default behavior if no such endpoints exist.
However, we've observed that connections to the k8s service IP fail with ECONNREFUSED when no endpoints in the zones exist.
How can we reproduce the issue?
- Have a Kubernetes 1.33 cluster with Cilium 1.18.0 (in our case it's EKS, but the bug seems to be independent of that), have nodes in 3 AZs.
- I've provided some manifests to deploy a test server and a debug shell.
If both are run with replica=3, everything works as expected, however, when downscaling the server pods to 2 or 1 AZ, only debug shells in the same AZ can connect to a backend.
apiVersion: v1
kind: ConfigMap
metadata:
name: python-script
data:
entrypoint.py: |
import os
from http.server import BaseHTTPRequestHandler, HTTPServer
class PodNameHandler(BaseHTTPRequestHandler):
def do_GET(self):
pod_name = os.environ.get('POD_NAME', 'unknown-pod')
self.send_response(200)
self.send_header('Content-Type', 'text/plain; charset=utf-8')
self.end_headers()
self.wfile.write(f"Pod name: {pod_name}\n".encode('utf-8'))
def run(server_class=HTTPServer, handler_class=PodNameHandler, port=8080):
server_address = ('', port)
httpd = server_class(server_address, handler_class)
print(f"Starting server on port {port}, serving Pod name...")
httpd.serve_forever()
if __name__ == '__main__':
run()
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: python-deployment
labels:
app: python
spec:
replicas: 3
selector:
matchLabels:
app: python
template:
metadata:
labels:
app: python
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- python
topologyKey: topology.kubernetes.io/zone
containers:
- name: python
image: python:3.13.5-alpine
command: ["python", "/app/entrypoint.py"]
ports:
- containerPort: 8080
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: python-script
mountPath: /app
volumes:
- name: python-script
configMap:
name: python-script
---
apiVersion: v1
kind: Service
metadata:
name: python-service
spec:
trafficDistribution: PreferClose
selector:
app: python
ports:
- protocol: TCP
port: 8080
targetPort: 8080
type: ClusterIP
Cilium Version
1.18.0
Kernel Version
6.12.37-61.105.amzn2023.x86_64
Kubernetes Version
v1.33.3-eks-3abbec1
Regression
no regression
Sysdump
Will add later.
Relevant log output
No logs in cilium relevant to the issue.
Anything else?
Possibly related to #40883
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
affects/mainThis issue affects main branchThis issue affects main branchaffects/v1.18This issue affects v1.18 branchThis issue affects v1.18 brancharea/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.Impacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.