-
Notifications
You must be signed in to change notification settings - Fork 567
Closed
Labels
staleIssues that are stale. These will not be prioritized without further engagement on the issue.Issues that are stale. These will not be prioritized without further engagement on the issue.
Description
Background
Solo.io is donating it's gloo-gateway project to the CNCF. As part of this process, we will need to restructure some of the code for the following reasons:
- Currently the Gloo APIs contain some proprietary fields that should not be present in a CNCF project.
- Currently Gloo supports a variety of APIs: knative, k8s ingress api, gloo's own gateway api, and the kubernetes' gateway api. With the donation, we'll want to focus on support just the kubernetes' gateway api.
Goals
- Remove gloo's proprietary APIs. We also want to change the group and kind of the CRDs to KubeGateway
- This will be done with care, to make sure not to break existing production users.
- Support extension points to customize translation behavior using extension krt collections
- Use as much as possible existing APIs (i.e. no Upstream of Kube type where we can use Services), to reduce cognitive load on users.
- Use familiar development patterns (i.e. kube-builder for CRDs vs SoloKit) for easier DevEx for community developers.
- Allow existing users a migration path
Non-Goals
- No need to support all of gloo plugins if they are hard to port over; especially the ones that are not in wide use by the community (http tunneling comes to mind).
Plan
Overview:
- Fix CI
- Bisect gloo and KubeGateway
- Create IR that allows to opaquely extend the control plane
- Helm chart
Order of operations:
- The CI should be done first, the Helm chart is independent and can be done in parallel with the rest.
- After CI is done, "Bisecting gloo", "abstracting the api snapshot" can be done in parallel.
- "translation logic using IR" can start after the "abstracting the api snapshot" is done.
- "Iterate on the IR" represents multiple tasks that can be done in parallel after "translation logic using IR" is done.
Fix CI (should be done first)
- We probably don't need bullduzer bot, as GH added the features it gave us - we can replace it with merge-queues and merge-when-ready button.
- Changelog bot. we need to figure something out here, but we can start without it.
- consolidate on GH actions (no gcloud build)
Ideally this should be done first, so we have CI for the next steps.
Bisect gloo and KubeGateway
Goal:
- In KubeGateway - we keep translation part of gloo, and remove gloo's SetupSync,EnvoySyncer, and most other gloo code that is not related to translation and plugins and that this project does not depend on. If easier, we can initially just disable the gloo syncer before we delete it anything.
- In gloo - we remove the k8s gateway support (this is the easy part). Initially, keep the plugins and translator. Later on, re-write the plguins to fit k8sgateway plugin model, and have k8sgateway be an extension for gloo.
Steps:
- move the xds-server serving to k8s gateway (should be easy. we should consider using upstream go-control-plane - as it now supports delta xDS)
- we can probably drop some of the snapshot sanitizers, in favor of envoy options (i.e. no need to handle non existent cluster in snapshot)
- disable calling gloo's setup function on our Main function.
- At this point KubeGateway can still read gloo's upstream and RouteOptions if the CRDs are installed
- KubeGateway will not install them on its own.
- see below if we should write statuses.
- Separating the xds-server:
- today, KubeGateway uses gloo's xds-server. now KubeGateway will have its own xds-server. see more in questions section below.
- Admin page:
- KubeGateway currently uses gloo's admin page. we will need an admin page for it instead.
Opaque extendablity
Goal:
- Stop using solo-kit's ApiSnapshot object, as it is coupled to gloo's APIs (this step also enables performance benefits).
- Create an extension model, such that things that targetRef a route (like RouteOptions), can mutate without the translation being aware of their underlying type.
Steps:
- TBD, but probably something like a a collection of mutation functions.
Helm chart (can be done in parallel)
- I think we can start a new helm chart for KubeGateway - it will be simpler as it won't include the proxy deployment parts. I don't believe we need code-gen for the helm chart.
Questions / Discussion
- Do we want KubeGateway to write Status to gloo's routeOptions? given that they are "deprecated" in KubeGateway, i think not. Upstream status writing also requires some thought
- Regard control plane and snapshopt cache: currently, KubeGateway re-uses Gloo's xds-server. To keep things simple, I suggest we don't support re-use KubeGateway's xds-server in gloo. Instead, gloo will have 2 control planes (on different ports)
- How do we handle extensions that require the old proxy? cc @lgadban (Note: that we just need to allow for it here; I think eventually we will need to re-write this logic so it doesn't rely on proxy object)
- I suspect these plugins are not worth porting over (lack of use). They can still remain in gloo:
- http tunnel
- gloo's rest/grpc (though we may want to keep the regular json->grpc plugin)
- consul/nomad support
- ec2
- service_spec \ subset_spec on upstream
- Any reason to not use upstream
go-control-plane
? it seems to support more features now than the solo-kit one.
Metadata
Metadata
Assignees
Labels
staleIssues that are stale. These will not be prioritized without further engagement on the issue.Issues that are stale. These will not be prioritized without further engagement on the issue.
Type
Projects
Status
Done