-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
I installed both Cilium and the AWS Load Balancer Controller. Cilium had the Gateway API support feature enabled.
I created a Gateway and a HTTPRoute. My expectation was:
- Gateway controller creates a LoadBalancer for the Gateway resource
- Mutating webhook from AWS LB Controller adds
spec.loadBalancerClass: service.k8s.aws/nlb
to the LoadBalancer resource - Load Balancer is created as an NLB
- Gateway successfully created with NLB as backing Service
Instead, I see:
- Gateway controller creates a LoadBalancer for the Gateway resource
- Mutating webhook from AWS LB Controller adds
spec.loadBalancerClass: service.k8s.aws/nlb
to the LoadBalancer resource - Load Balancer is created as an NLB
- Gateway never becomes ready
Status:
Conditions:
Last Transition Time: 2023-08-14T16:22:53Z
Message: Unable to create Service resource
Observed Generation: 1
Reason: NoResources
Status: False
Type: Accepted
Last Transition Time: 2023-08-14T16:22:53Z
Message: Address is not ready
Observed Generation: 1
Reason: ListenersNotReady
Status: False
Type: Programmed
Listeners:
Attached Routes: 1
Conditions:
Last Transition Time: 2023-08-14T16:22:53Z
Message: Listener Programmed
Observed Generation: 1
Reason: Programmed
Status: True
Type: Programmed
Last Transition Time: 2023-08-14T16:22:53Z
Message: Listener Accepted
Observed Generation: 1
Reason: Accepted
Status: True
Type: Accepted
Name: web-gw
Supported Kinds:
Group: gateway.networking.k8s.io
Kind: HTTPRoute
The logs (see below) imply that the reconciler is attempting to overwrite the spec.loadBalancerClass
added by the webhook with null
.
From what I understand, the ensureService function is called twice - once to create the LoadBalancer (which is later injected by the mutating LB webhook), and once again on a subsequent reconcile loop to add some labels & annotations:
temp := existing.DeepCopy()
temp.Spec = desired.Spec
setMergedLabelsAndAnnotations(temp, desired)
return r.Client.Patch(ctx, temp, client.MergeFrom(existing))
The spec of the existing service is overwritten with this desired.Spec
, which only sets the type and ports of the LoadBalancer Service:
return &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: shorten(ciliumGatewayPrefix + resource.Name),
Namespace: resource.Namespace,
Labels: map[string]string{owningGatewayLabel: resource.Name},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: gatewayv1beta1.GroupVersion.String(),
Kind: resource.Kind,
Name: resource.Name,
UID: types.UID(resource.UID),
Controller: model.AddressOf(true),
},
},
},
Spec: corev1.ServiceSpec{
Type: corev1.ServiceTypeLoadBalancer,
Ports: ports,
},
}
This results in spec.loadBalancerClass
being reset to null, which is a change in that field and not permitted on the API.
Cilium Version
1.14.3, 1.15.0-pre.2
Kernel Version
Linux 5.10.192-183.736.amzn2.x86_64
Kubernetes Version
1.28
Sysdump
No response
Relevant log output
Will update with logs shortly.
Anything else?
A workaround for Kyverno users was posted here.
The pattern of the AWS Load Balancer Controller injecting spec.loadBalancerClass
via a webhook is fairly common in EKS, and while it will be possible to control the LB Controller purely through annotations via the Gateway.spec.infrastructure.annotations
as of Cilium v1.15.0 (which I'm super excited to use), this doesn't cover the out-of-the-box integration with the LB Controller that most will be used to.
Code of Conduct
- I agree to follow this project's Code of Conduct