Skip to content
This repository was archived by the owner on Jul 11, 2023. It is now read-only.

(feat/statefulsets): MeshService API changes for Headless Services #4704

Merged
merged 21 commits into from
May 19, 2022

Conversation

keithmattix
Copy link
Contributor

@keithmattix keithmattix commented Apr 28, 2022

Description:
In order to provide statefulset support, introduce 2 new public
functions to the MeshService type to accomplish the following:

  1. The ProviderKey function describes the mapping from a MeshService back to a provider
    service (e.g. a Kubernetes service). This decouples the MeshService name
    from being a foreign key between the provider's collection of services
    and the MeshCatalog's set of services.
  2. The Subdomain function returns the subdomain for a particular mesh service (e.g. the pod name for a particular pod within a headless service). This allows for proper matching of endpoints when building EDS and CDS responses.

Both of these functions rely on a new convention around MeshService.Name in that the name shall be formatted as ([subdomain.]providerKey). The values returned by these functions are calculated once and then memoized via unexported fields. The MeshService.Equals() function has been updated to account for these unexported fields.

Because of the semantics of statefulsets and headless services in Kubernetes, it is important to highlight the following guidance re: headless services:

Specific statefulset instances (e.g. mysql-0.mysql.namespace.svc.cluser.local) should only be
referenced for traffic routing (e.g. TrafficSplit) and not policy (e.g. TrafficTarget, Retry).
Conversely, the headless service (e.g. mysql.namespace.svc.cluster.local) should only be referenced
for policy and not traffic routing.

Signed-off-by: Keith Mattix II keithmattix2@gmail.com

Testing done:

Affected area:

Functional Area
New Functionality [ ]
CI System [ ]
CLI Tool [ ]
Certificate Management [ ]
Control Plane [ ]
Demo [ ]
Documentation [ ]
Egress [ ]
Ingress [ ]
Install [ ]
Networking [ ]
Observability [ ]
Performance [ ]
SMI Policy [ ]
Security [ ]
Sidecar Injection [ ]
Tests [ ]
Upgrade [ ]
Other [ ]

Please answer the following questions with yes/no.

  1. Does this change contain code from or inspired by another project?

    • Did you notify the maintainers and provide attribution?
  2. Is this a breaking change?

  3. Has documentation corresponding to this change been updated in the osm-docs repo (if applicable)?

@keithmattix keithmattix force-pushed the meshservice-providerKey branch from b13c1a9 to 9be15b3 Compare April 28, 2022 20:39
@keithmattix keithmattix force-pushed the meshservice-providerKey branch from 161cb93 to 0099ad1 Compare May 2, 2022 18:44
@keithmattix keithmattix requested a review from steeling May 2, 2022 18:53
@keithmattix keithmattix requested a review from steeling May 2, 2022 20:06
@keithmattix keithmattix changed the title Introduce service.ProviderMapper to MeshService (feat/statefulsets): MeshService API changes May 3, 2022
@keithmattix keithmattix force-pushed the meshservice-providerKey branch 2 times, most recently from 1ac81e4 to cb6a880 Compare May 9, 2022 17:27
@keithmattix keithmattix changed the title (feat/statefulsets): MeshService API changes (feat/statefulsets): MeshService API changes for Headless Services May 9, 2022
@keithmattix keithmattix force-pushed the meshservice-providerKey branch from e513dc8 to 20e538b Compare May 9, 2022 21:38
@keithmattix keithmattix marked this pull request as ready for review May 9, 2022 21:38
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2022

Codecov Report

Merging #4704 (7bf723a) into main (ec9b9f9) will increase coverage by 0.57%.
The diff coverage is 88.35%.

@@            Coverage Diff             @@
##             main    #4704      +/-   ##
==========================================
+ Coverage   68.55%   69.13%   +0.57%     
==========================================
  Files         228      227       -1     
  Lines       16289    16316      +27     
==========================================
+ Hits        11167    11280     +113     
+ Misses       5070     4984      -86     
  Partials       52       52              
Flag Coverage Δ
unittests 69.13% <88.35%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/k8s/types.go 100.00% <ø> (ø)
pkg/validator/validators.go 80.51% <ø> (+0.90%) ⬆️
pkg/cli/verifier/envoy_config.go 65.56% <63.63%> (-1.06%) ⬇️
pkg/catalog/inbound_traffic_policies.go 94.17% <71.42%> (+0.73%) ⬆️
pkg/service/types.go 66.66% <71.42%> (+4.16%) ⬆️
pkg/k8s/client.go 93.05% <93.75%> (-0.04%) ⬇️
pkg/k8s/util.go 96.03% <96.29%> (-0.20%) ⬇️
pkg/catalog/ingress.go 96.52% <100.00%> (+0.09%) ⬆️
pkg/catalog/outbound_traffic_policies.go 94.67% <100.00%> (ø)
pkg/catalog/retry.go 92.59% <100.00%> (+0.92%) ⬆️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec9b9f9...7bf723a. Read the comment docs.

Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change. The overall idea looks good, left some comments to enable the reuse of code and readability.

@shashankram shashankram added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 11, 2022
Copy link
Contributor

@steeling steeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an E2E test in this PR as well?

@keithmattix keithmattix force-pushed the meshservice-providerKey branch 2 times, most recently from 582b40c to 6ea3490 Compare May 12, 2022 22:08
@keithmattix
Copy link
Contributor Author

@steeling An E2E test for all statefulset functionality will be added in subsequent PRs. This PR is only for the MeshService API changes and their downstream effects

Copy link
Contributor

@steeling steeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all the init/partial stuff, if the issue is unexported fields, it's likely a code smell that we're using the wrong data structure, ie: instead of a hashset let's just use a map

@keithmattix keithmattix requested a review from steeling May 16, 2022 21:32
As a stepping stone for statefulset support (openservicemesh#3477), introduce a new
interface describing the ability to map an entity back to a provider
service (e.g. a Kubernetes service). This decouples the MeshService name
from being a foreign key between the provider's collection of services
and the MeshCatalog's set of services

Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
When retrieving MeshServices in order to create local clusters for a pod,
exclude MeshServices whose subdomains don't match the pod's name

Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Tracking the unexported MeshService fields is difficult, and there
are several hidden bugs that can occur depending on what fields are
accessed. So, we create the NewMeshService and NewPartialMeshService
functions to aid in correct usage

Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
@keithmattix keithmattix force-pushed the meshservice-providerKey branch from 20a60a6 to c7a568e Compare May 16, 2022 22:59
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
@keithmattix keithmattix requested a review from nojnhuh May 17, 2022 18:47
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
@keithmattix keithmattix removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2022
{
Addresses: []v1.EndpointAddress{
{
IP: "8.8.8.9",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IPs here aren't relevant to the test, I'd remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPs are also used down the callstack in ServiceToMeshServices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More info: #4704 (comment)

Namespace: namespace,
Name: service1Name,
},
Subsets: []v1.EndpointSubset{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above, if these aren't headless services, then it seems like Subsets doesn't matter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean now; yes they technically aren't needed. I don't know how I feel about using unrealistic resources in tests though....feels like as soon as that thing needs to be used, things break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal rule of thumb is "if I can comment it out and the test still passes, remove it." I find trying to make things as realistic as possible in unit tests is more work than it's worth and makes the tests harder to reason about later. If Subsets does come into play later for non-headless services, then the tests will almost certainly need to be updated to account for that anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get where you're coming from, and I certainly think there's a middle ground there. Spending a ton of time making tests hyper-realistic is wasteful, but at the same time, stripping test resources too far down can lead to unexpected behavior. Personally, I find unrealistic objects harder to reason about (e.g. "what does it mean for an Endpoints resource to have a nil subset"). I actually ran into this quite a bit while developing this feature: headless services are essentially a special case, but because a lot of our test objects don't have ClusterIPs, I had to go update unrelated tests when I built out headless service support. This is the primary reason why I started to add ClusterIPs to most of my test service objects. Basically, I'm just trying to say there's a balance. For this test in particular, I think need to add a headless service anyway for the test, but I did want to chat about the general point for posterity's sake 😄

Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
@keithmattix keithmattix requested a review from nojnhuh May 18, 2022 22:48
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
Signed-off-by: Keith Mattix II <keithmattix2@gmail.com>
@nojnhuh nojnhuh merged commit 0af42df into openservicemesh:main May 19, 2022
@keithmattix keithmattix deleted the meshservice-providerKey branch May 19, 2022 20:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants