Skip to content

Gateway API controller overwriting Service.spec.loadBalancerClass if set #28949

@coro

Description

@coro

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I installed both Cilium and the AWS Load Balancer Controller. Cilium had the Gateway API support feature enabled.

I created a Gateway and a HTTPRoute. My expectation was:

  • Gateway controller creates a LoadBalancer for the Gateway resource
  • Mutating webhook from AWS LB Controller adds spec.loadBalancerClass: service.k8s.aws/nlb to the LoadBalancer resource
  • Load Balancer is created as an NLB
  • Gateway successfully created with NLB as backing Service

Instead, I see:

  • Gateway controller creates a LoadBalancer for the Gateway resource
  • Mutating webhook from AWS LB Controller adds spec.loadBalancerClass: service.k8s.aws/nlb to the LoadBalancer resource
  • Load Balancer is created as an NLB
  • Gateway never becomes ready
Status:
  Conditions:
    Last Transition Time:  2023-08-14T16:22:53Z
    Message:               Unable to create Service resource
    Observed Generation:   1
    Reason:                NoResources
    Status:                False
    Type:                  Accepted
    Last Transition Time:  2023-08-14T16:22:53Z
    Message:               Address is not ready
    Observed Generation:   1
    Reason:                ListenersNotReady
    Status:                False
    Type:                  Programmed
  Listeners:
    Attached Routes:  1
    Conditions:
      Last Transition Time:  2023-08-14T16:22:53Z
      Message:               Listener Programmed
      Observed Generation:   1
      Reason:                Programmed
      Status:                True
      Type:                  Programmed
      Last Transition Time:  2023-08-14T16:22:53Z
      Message:               Listener Accepted
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
    Name:                    web-gw
    Supported Kinds:
      Group:  gateway.networking.k8s.io
      Kind:   HTTPRoute

The logs (see below) imply that the reconciler is attempting to overwrite the spec.loadBalancerClass added by the webhook with null.

From what I understand, the ensureService function is called twice - once to create the LoadBalancer (which is later injected by the mutating LB webhook), and once again on a subsequent reconcile loop to add some labels & annotations:

        temp := existing.DeepCopy()
        temp.Spec = desired.Spec
        setMergedLabelsAndAnnotations(temp, desired)


        return r.Client.Patch(ctx, temp, client.MergeFrom(existing))

The spec of the existing service is overwritten with this desired.Spec, which only sets the type and ports of the LoadBalancer Service:

        return &corev1.Service{
                ObjectMeta: metav1.ObjectMeta{
                        Name:      shorten(ciliumGatewayPrefix + resource.Name),
                        Namespace: resource.Namespace,
                        Labels:    map[string]string{owningGatewayLabel: resource.Name},
                        OwnerReferences: []metav1.OwnerReference{
                                {
                                        APIVersion: gatewayv1beta1.GroupVersion.String(),
                                        Kind:       resource.Kind,
                                        Name:       resource.Name,
                                        UID:        types.UID(resource.UID),
                                        Controller: model.AddressOf(true),
                                },
                        },
                },
                Spec: corev1.ServiceSpec{
                        Type:  corev1.ServiceTypeLoadBalancer,
                        Ports: ports,
                },
        }

This results in spec.loadBalancerClass being reset to null, which is a change in that field and not permitted on the API.

Cilium Version

1.14.3, 1.15.0-pre.2

Kernel Version

Linux 5.10.192-183.736.amzn2.x86_64

Kubernetes Version

1.28

Sysdump

No response

Relevant log output

Will update with logs shortly.

Anything else?

A workaround for Kyverno users was posted here.

The pattern of the AWS Load Balancer Controller injecting spec.loadBalancerClass via a webhook is fairly common in EKS, and while it will be possible to control the LB Controller purely through annotations via the Gateway.spec.infrastructure.annotations as of Cilium v1.15.0 (which I'm super excited to use), this doesn't cover the out-of-the-box integration with the LB Controller that most will be used to.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions