Remove unused code paths and switch the policy object to a policy builder #4791

steeling · 2022-06-03T22:31:21Z

Remove large swaths of code by removing unused code paths and merging some of the resulting small functions into their callers.

This PR adds the following simplifications (with justification provided) around building RBAC rules from the policy object:

Remove the method AddRule, which is only called in a single instance, where there are no pre-existing rules (permissive mode).
Modify getTrafficSpecName from a method to a function. This provides additional clarity to the reader since the reader knows the method getTrafficSpecName only operates on its parameters, and not on the fields of the method's instance.
Support only OR rules for Principals. Note that this changes from nesting inside of a single Top-level "or Rule", to simply listing all of the principals, which uses OR semantics by default.
Both OR and AND are supported for permissions, however OR rules are similarly moved to top level permissions, which also use OR by default
Simplify the wildcard scenario to result in a single principal of ANY, whereas before it could have both ANY, and specific principals.
Remove code paths associated with allowPartialHostnamesMatch, since only a single value was ever provided.

codecov-commenter · 2022-06-06T16:48:59Z

Codecov Report

Merging #4791 (852c7e8) into main (ecc4e67) will decrease coverage by 0.32%.
The diff coverage is 96.51%.

❗ Current head 852c7e8 differs from pull request most recent head 37506ce. Consider uploading reports for the commit 37506ce to get more accurate results

@@            Coverage Diff             @@
##             main    #4791      +/-   ##
==========================================
- Coverage   68.96%   68.64%   -0.33%     
==========================================
  Files         227      223       -4     
  Lines       16454    16266     -188     
==========================================
- Hits        11348    11166     -182     
+ Misses       5054     5049       -5     
+ Partials       52       51       -1

Flag	Coverage Δ
unittests	`68.64% <96.51%> (-0.33%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/envoy/rds/response.go	`82.85% <ø> (+3.90%)`	⬆️
pkg/errcode/errcode.go	`100.00% <ø> (ø)`
pkg/service/types.go	`62.50% <0.00%> (-4.17%)`	⬇️
pkg/catalog/inbound_traffic_policies.go	`94.35% <100.00%> (+0.17%)`	⬆️
pkg/catalog/ingress.go	`96.29% <100.00%> (-0.23%)`	⬇️
pkg/catalog/traffictarget.go	`90.00% <100.00%> (+0.34%)`	⬆️
pkg/envoy/lds/rbac.go	`80.35% <100.00%> (-2.36%)`	⬇️
pkg/envoy/rbac/policy.go	`100.00% <100.00%> (+5.62%)`	⬆️
pkg/envoy/rds/route/rbac.go	`92.85% <100.00%> (+0.40%)`	⬆️
pkg/envoy/rds/route/route_config.go	`100.00% <100.00%> (ø)`
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecc4e67...37506ce. Read the comment docs.

…lder. Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

shashankram

I see too many parts of the code being changed without a clear motivation being described in the commit message. It's hard for me to understand why significant parts of the RBAC policy builder + traffic policy helpers are being refactored.

My request to you is:

Clearly describe the motivation of the PR in the commit and PR descriptions
Break this PR into smaller independent units (removal of unused code, traffic policy, RBAC etc.)
Open a Github issue for larger refactors so they can be tracked and discussed

pkg/envoy/rbac/policy.go

pkg/trafficpolicy/trafficpolicy.go

steeling · 2022-06-06T18:32:59Z

I see too many parts of the code being changed without a clear motivation being described in the commit message. It's hard for me to understand why significant parts of the RBAC policy builder + traffic policy helpers are being refactored.

My request to you is:

Clearly describe the motivation of the PR in the commit and PR descriptions

Break this PR into smaller independent units (removal of unused code, traffic policy, RBAC etc.)

Open a Github issue for larger refactors so they can be tracked and discussed

A net 650 lines of code removed is a big benefit in my opinion. All of the refactored code is due to unused code paths, or inlining functions that are called in one place (particularly after portions of that function that are not used are removed).

The net result is a significant improvement in readability, simplicity, with an added benefit of less bytes used in the envoy config itself :)

shashankram · 2022-06-06T18:52:09Z

I see too many parts of the code being changed without a clear motivation being described in the commit message. It's hard for me to understand why significant parts of the RBAC policy builder + traffic policy helpers are being refactored.
My request to you is:

Clearly describe the motivation of the PR in the commit and PR descriptions

Break this PR into smaller independent units (removal of unused code, traffic policy, RBAC etc.)

Open a Github issue for larger refactors so they can be tracked and discussed

A net 650 lines of code removed is a big benefit in my opinion. All of the refactored code is due to unused code paths, or inlining functions that are called in one place (particularly after portions of that function that are not used are removed).

The net result is a significant improvement in readability, simplicity, with an added benefit of less bytes used in the envoy config itself :)

I am not disputing the removal of unused code, rather pointing out that there's not enough context to review these changes effectively. For example, the RBAC policy code is being refactored, yet there isn't any indication on why this is necessary or if there are existing bugs. It would be a lot easier to review by breaking this change up into smaller units, each potentially addressing a localized problem.

steeling · 2022-06-06T19:39:48Z

I see too many parts of the code being changed without a clear motivation being described in the commit message. It's hard for me to understand why significant parts of the RBAC policy builder + traffic policy helpers are being refactored.
My request to you is:

Clearly describe the motivation of the PR in the commit and PR descriptions

Break this PR into smaller independent units (removal of unused code, traffic policy, RBAC etc.)

Open a Github issue for larger refactors so they can be tracked and discussed

A net 650 lines of code removed is a big benefit in my opinion. All of the refactored code is due to unused code paths, or inlining functions that are called in one place (particularly after portions of that function that are not used are removed).
The net result is a significant improvement in readability, simplicity, with an added benefit of less bytes used in the envoy config itself :)

I am not disputing the removal of unused code, rather pointing out that there's not enough context to review these changes effectively. For example, the RBAC policy code is being refactored, yet there isn't any indication on why this is necessary or if there are existing bugs. It would be a lot easier to review by breaking this change up into smaller units, each potentially addressing a localized problem.

It's mostly for readability. I have to make a couple small changes here with respect to the trust domain, and ended up spending a lot more time than I anticipated in simply understanding the code. In my experience this understanding will last for about a month, and then I'll need to refresh myself.

The goal here is purely around simplicity. I can add more context, but I'd expect that this PR is simple enough to remain as a single PR vs splitting it up

shashankram · 2022-06-06T20:42:29Z

Remove And/OR rules from permissions and principals in RBAC, since OR is the only flavor we ever use. We now no longer need to nest multiple lists of principals in the principals array, with OR's, since multiple instances in the top-level list are applied with OR semantics anyways.

I suggest not removing what is already supported. There is a reason that And/Or rules was implemented for permissions and principals, so that we could support more fine grained RBAC on attributes other than just service-account, eg. source/destination IP/port, L7 headers etc.

I am not in favor of refactoring out this functionality for the purpose of integrating the trust domain work. Instead, I'd like to see the trust domain transparently plumbed using the existing code.

keithmattix

I love the transition to a builder pattern here, but I'm hesitant to remove the code powering AND semantics. Just because SMI doesn't really support it now doesn't mean customers don't want/need that policy crafting tool

steeling · 2022-06-06T21:31:16Z

FWIW i wrote this PR in < a day, and I'd estimate re-adding that functionality to take a similar amount of time. However we read code much more than we write it, so I think removing unused features, when it would be trivial to add them back, makes sense.

Especially since the and/or rules, as written are limited, in that they don't support nesting. If we do need AND later, there's no promise that the current implementation would even support the need.

I'd love to get more thoughts here, especially around the idea of maintaining unused code, particularly since I'll be adding on top of this, so essentially we're adding code on top of tech debt.

@trstringer @nojnhuh @jaellio

shashankram · 2022-06-06T22:48:29Z

FWIW i wrote this PR in < a day, and I'd estimate re-adding that functionality to take a similar amount of time. However we read code much more than we write it, so I think removing unused features, when it would be trivial to add them back, makes sense.

Especially since the and/or rules, as written are limited, in that they don't support nesting. If we do need AND later, there's no promise that the current implementation would even support the need.

I'd love to get more thoughts here, especially around the idea of maintaining unused code, particularly since I'll be adding on top of this, so essentially we're adding code on top of tech debt.

@trstringer @nojnhuh @jaellio

A lot of effort went into ensuring the given RBAC policy generator supports basic AND/OR semantics to accommodate future use cases beyond just the service principal. We didn't need complex nesting semantics, so it wasn't an immediate priority back then.

I am not in favor of refactoring this code and removing core functionality just to plumb the trust domain work, which I don't believe requires significant refactoring of this code path. If this refactor was done without removing the generic AND/OR support for readability, then it would be more appealing to me. But as it stands, I am not in favor of this change.

steeling · 2022-06-06T23:13:28Z

FWIW i wrote this PR in < a day, and I'd estimate re-adding that functionality to take a similar amount of time. However we read code much more than we write it, so I think removing unused features, when it would be trivial to add them back, makes sense.
Especially since the and/or rules, as written are limited, in that they don't support nesting. If we do need AND later, there's no promise that the current implementation would even support the need.
I'd love to get more thoughts here, especially around the idea of maintaining unused code, particularly since I'll be adding on top of this, so essentially we're adding code on top of tech debt.
@trstringer @nojnhuh @jaellio

A lot of effort went into ensuring the given RBAC policy generator supports basic AND/OR semantics to accommodate future use cases beyond just the service principal. We didn't need complex nesting semantics, so it wasn't an immediate priority back then.

I am not in favor of refactoring this code and removing core functionality just to plumb the trust domain work, which I don't believe requires significant refactoring of this code path. If this refactor was done without removing the generic AND/OR support for readability, then it would be more appealing to me. But as it stands, I am not in favor of this change.

I'd say YAGNI should be a leading principle here. Since this seems to be relatively subjective I'd like to open it up to others to get opinions before reverting if that's ok

steeling · 2022-06-07T17:45:22Z

@nojnhuh @trstringer @jaellio pinging other maintainers for opinions on above

steeling · 2022-06-08T18:34:38Z

FWIW i wrote this PR in < a day, and I'd estimate re-adding that functionality to take a similar amount of time. However we read code much more than we write it, so I think removing unused features, when it would be trivial to add them back, makes sense.
Especially since the and/or rules, as written are limited, in that they don't support nesting. If we do need AND later, there's no promise that the current implementation would even support the need.
I'd love to get more thoughts here, especially around the idea of maintaining unused code, particularly since I'll be adding on top of this, so essentially we're adding code on top of tech debt.
@trstringer @nojnhuh @jaellio

A lot of effort went into ensuring the given RBAC policy generator supports basic AND/OR semantics to accommodate future use cases beyond just the service principal. We didn't need complex nesting semantics, so it wasn't an immediate priority back then.
I am not in favor of refactoring this code and removing core functionality just to plumb the trust domain work, which I don't believe requires significant refactoring of this code path. If this refactor was done without removing the generic AND/OR support for readability, then it would be more appealing to me. But as it stands, I am not in favor of this change.

I'd say YAGNI should be a leading principle here. Since this seems to be relatively subjective I'd like to open it up to others to get opinions before reverting if that's ok

Adding to this argument, I have some proposals to remove most of the logic from the trafficpolicy objects like Rules. Instead, the meshcatalog should present more normalized data, and allow builders (like policy.Builder), to specify the rules.

So instead of

builder.SetRules(rules)

Where rules contains business logic on how the rules were applied (or at least needed a lot of business logic to construct), we would instead do:

builder.SetApplyPermissionsWithAnd(true)
// or 
builder.SetAndRules(rules)

This allows for that flexible nesting that is currently not possible, removes this "double" translation logic that is currently happening, simplifies the code base, and is trivial to implement.

With that system I double down that we don't need to maintain unused AND logic when we don't use it. If for nothing else that

It would be incredibly trivial to implement with the above style
If I implement it now, I have little to no confidence it would actually support our hypothetical future use cases
If I implement it now, I'm spending time to implement it... why not spend this time when it is needed
YAGNI

Again pinging the rest of the maintainers for there thoughts here

As a middle ground, I'd be happy to provide that toggle on the policy builder, but I don't think it belongs in the mesh catalog

@nojnhuh, @trstringer, @jaellio

steeling · 2022-06-08T18:46:30Z

As a final point of data, why does the mesh catalog know that a port translates into a rule? We should follow the rule of single responsbility, where the mesh catalog is aware about the state of the mesh, not how that translates to an envoy config (which is the responsbility of the builders).

Instead, it should be supplying available ports, not a set of AND/OR rules. That places envoy config logic in multiple places.

Again, happy to implement on the policy builder side but I strongly believe adding this logic within the mesh catalog would take us in the wrong direction

shashankram · 2022-06-08T19:01:48Z

As a final point of data, why does the mesh catalog know that a port translates into a rule? We should follow the rule of single responsbility, where the mesh catalog is aware about the state of the mesh, not how that translates to an envoy config (which is the responsbility of the builders).

Instead, it should be supplying available ports, not a set of AND/OR rules. That places envoy config logic in multiple places.

Again, happy to implement on the policy builder side but I strongly believe adding this logic within the mesh catalog would take us in the wrong direction

My primary concern pertains to removing the capability to remove AND/OR rule semantics from the existing RBAC code. I am not against refactoring this code to use a data model. As we are adding more complex features, I can imagine us needing to use AND/OR semantics for RBAC rules, as the API is complex due to nesting of protos. I also believe we are not going to expose nested APIs to users, so top level AND/OR support should be good enough (e.g. filter on source IP AND destination port, or filter on source IP OR source principal, etc.).

Further, I understand that you are proposing a design change to the MeshCatalog. I am not against this either. I want to point out that the existing design of MeshCatalog is to translate k8s config to objects that can easily be translated to XDS objects, vs needing to call into k8s APIs from pkg/envoy, hence the code is structured this way. And will also note that the MeshCatalog and route building code has been heavily refactored multiple times by different engineers, following a new pattern each time. Obviously, this is not ideal but was done in the interest of time. So maybe a concise doc on how we want MeshCatalog to look could be a starting point. Without having an underlying philosphy documented, these changes are hard to reason about.

keithmattix · 2022-06-08T19:13:47Z

In general, I'm in favor of docs or other formats as a method of detailing technical goals maintainers & other community members have for a project's code base. These kinds of docs serve as a guiding light to folks working on the project, and when a group of people have an explicit, common goal, accomplishing that goal tends to be smoother.

steeling · 2022-06-08T19:14:25Z

As a final point of data, why does the mesh catalog know that a port translates into a rule? We should follow the rule of single responsbility, where the mesh catalog is aware about the state of the mesh, not how that translates to an envoy config (which is the responsbility of the builders).
Instead, it should be supplying available ports, not a set of AND/OR rules. That places envoy config logic in multiple places.
Again, happy to implement on the policy builder side but I strongly believe adding this logic within the mesh catalog would take us in the wrong direction

My primary concern pertains to removing the capability to remove AND/OR rule semantics from the existing RBAC code. I am not against refactoring this code to use a data model. As we are adding more complex features, I can imagine us needing to use AND/OR semantics for RBAC rules, as the API is complex due to nesting of protos. I also believe we are not going to expose nested APIs to users, so top level AND/OR support should be good enough (e.g. filter on source IP AND destination port, or filter on source IP OR source principal, etc.).

Further, I understand that you are proposing a design change to the MeshCatalog. I am not against this either. I want to point out that the existing design of MeshCatalog is to translate k8s config to objects that can easily be translated to XDS objects, vs needing to call into k8s APIs from pkg/envoy, hence the code is structured this way. And will also note that the MeshCatalog and route building code has been heavily refactored multiple times by different engineers, following a new pattern each time. Obviously, this is not ideal but was done in the interest of time. So maybe a concise doc on how we want MeshCatalog to look could be a starting point. Without having an underlying philosphy documented, these changes are hard to reason about.

I think we're on the same page :) I just added a commit that includes the AND semantics to the policy portion, PTAL.

Agreed, I have a doc I'll be sharing with the team that lays out some founding principals to help make decision on how this code should be structured going forward

Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

shashankram

I am still not comfortable with the changes made to the RBAC code. Here's a suggestion, let me know if that works.

Refactor the existing policy code in a way where the test expectations remain the same. Currently, I see a lot of test code being updated, so it's extremely hard to spot whether the rules are applied exactly as before (defaultin to OR). If this is not the case, then this PR title is misleading. It would not be removing unusued code paths, but rather changing the behavior of the existing RBAC policies, which I believe should be a separate change in itself.
In a follow-up change, migrate from OR rules to top-level rules.

This way, there is an easier path to making the refactoring without changing the underlying behavior.

shashankram · 2022-06-08T20:29:43Z

pkg/catalog/traffictarget_test.go

@@ -507,7 +507,7 @@ func TestListInboundTrafficTargetsWithRoutes(t *testing.T) {
 					},
 					TCPRouteMatches: []trafficpolicy.TCPRouteMatch{
 						{
-							Ports: []int{8000, 9000},
+							Ports: []uint32{8000, 9000},


Can these be uint16 instead as port ranges are from 0-65535

pkg/envoy/lds/rbac.go

pkg/envoy/lds/rbac_test.go

shashankram · 2022-06-08T20:35:24Z

pkg/envoy/lds/rbac_test.go

 					},
 				},
 			},

 			expectedPolicy: &xds_rbac.Policy{
 				Permissions: []*xds_rbac.Permission{
-					{


Same comment as above.

pkg/envoy/rbac/policy.go

pkg/trafficpolicy/types.go

shashankram · 2022-06-08T21:35:06Z

Discussed this offline with @steeling. Although this change moves from OR based fields to top-level fields for both principals and permissions, by default Envoy uses OR based rules for top-level fields: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/rbac/v3/rbac.proto#envoy-v3-api-msg-config-rbac-v3-policy

Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

steeling requested review from shashankram, snehachhabria, nojnhuh, draychev, jaellio and trstringer as code owners June 3, 2022 22:31

steeling force-pushed the feature/refactor-policy branch 5 times, most recently from cba5b7c to 93e3857 Compare June 6, 2022 16:36

Remove unused code paths and switch the policy object to a policy bui…

6062c30

…lder. Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

steeling force-pushed the feature/refactor-policy branch from 93e3857 to 6062c30 Compare June 6, 2022 16:55

shashankram suggested changes Jun 6, 2022

View reviewed changes

pkg/envoy/rbac/policy.go Show resolved Hide resolved

pkg/envoy/rbac/policy.go Outdated Show resolved Hide resolved

pkg/trafficpolicy/trafficpolicy.go Show resolved Hide resolved

pkg/trafficpolicy/trafficpolicy.go Show resolved Hide resolved

steeling requested a review from shashankram June 6, 2022 19:47

keithmattix reviewed Jun 6, 2022

View reviewed changes

add AND to the policy builder

31ff5e4

Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

steeling force-pushed the feature/refactor-policy branch from 3455cd8 to 31ff5e4 Compare June 8, 2022 19:17

shashankram suggested changes Jun 8, 2022

View reviewed changes

shashankram approved these changes Jun 8, 2022

View reviewed changes

address comments

37506ce

Signed-off-by: Sean Teeling <seanteeling@microsoft.com>

steeling force-pushed the feature/refactor-policy branch from 852c7e8 to 37506ce Compare June 8, 2022 21:43

nojnhuh approved these changes Jun 9, 2022

View reviewed changes

nojnhuh merged commit eb281e5 into openservicemesh:main Jun 9, 2022

Remove unused code paths and switch the policy object to a policy builder #4791

Remove unused code paths and switch the policy object to a policy builder #4791

Uh oh!

Conversation

steeling commented Jun 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shashankram left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steeling commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shashankram commented Jun 6, 2022

Uh oh!

steeling commented Jun 6, 2022

Uh oh!

shashankram commented Jun 6, 2022

Uh oh!

keithmattix left a comment

Choose a reason for hiding this comment

Uh oh!

steeling commented Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shashankram commented Jun 6, 2022

Uh oh!

steeling commented Jun 6, 2022

Uh oh!

steeling commented Jun 7, 2022

Uh oh!

steeling commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steeling commented Jun 8, 2022

Uh oh!

shashankram commented Jun 8, 2022

Uh oh!

keithmattix commented Jun 8, 2022

Uh oh!

steeling commented Jun 8, 2022

Uh oh!

shashankram left a comment

Choose a reason for hiding this comment

Uh oh!

shashankram Jun 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shashankram Jun 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shashankram commented Jun 8, 2022

Uh oh!

Uh oh!

steeling commented Jun 3, 2022 •

edited

Loading

codecov-commenter commented Jun 6, 2022 •

edited

Loading

steeling commented Jun 6, 2022 •

edited

Loading

steeling commented Jun 6, 2022 •

edited

Loading

steeling commented Jun 8, 2022 •

edited

Loading