Skip to content

Conversation

incfly
Copy link

@incfly incfly commented May 4, 2018

This is for #5059 clean up some behavior before we generate multi filter chains using envoy tls multiplexing.

Authn policy also stops inferring the cluster's tls settings, so the outcome is DR is the only truth where client side tls settings come from.

Why in master: plan of the record, if works fine, can be shipped into 0.8 or 0.8.1 by merging master into release-0.8 branch.

Testing: Manual Testing TODO.

@incfly incfly requested review from costinm and diemtvu May 4, 2018 17:57
@istio-testing
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: geeknoid

Assign the PR to them by writing /assign @geeknoid in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@incfly
Copy link
Author

incfly commented May 4, 2018

@rshriram FYI

With closer looking at the code, I realized we have to modify cluster.go instead of authn plugin. Because that's the place we build both default and subset clusters config. whereas plugin is only involved with default cluster settings.

@rshriram
Copy link
Member

rshriram commented May 4, 2018

Plugin is invoked for all clusters

Copy link
Contributor

@diemtvu diemtvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rshriram, we are planning to avoid using authn policy for outbound cluster configuration, and require customers to set destination rule accordingly in order to enable mTLS. That's why the authn plugin code for outbound cluster can be removed.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master branch is outdated for cluster.go. I also suggest pushing this into release. We need filter chain in 0.8 for proper SNI routing.

Also does this mean we will no longer have a global auth setting and setup client side auth by “default”?

@rshriram
Copy link
Member

rshriram commented May 4, 2018

The reason for requesting changes is because this can be done cleanly in the plugin by making sure it’s called for all clusters and encapsulating the service account stuff in the auth plugin.

@incfly
Copy link
Author

incfly commented May 4, 2018

@rshriram you're right on this is called for all clusters including subset. https://github.com/istio/istio/blob/master/pilot/pkg/networking/core/v1alpha3/cluster.go#L114

So the determining config field comes from DR (TLSSettings), the question is who should consume this settings. I agree with @diemtvu that the consumer of this field should be in clusters.go, so the authn only focused on the server authn requirement, and let networking side deal with client settings. This is the intended seperation of client/server we want to have. Will be nice to make implementation aligned with user facing config.

If the concern is more of "why networking now needs to know where istio provided key cert live, etc?", we can extract those key certs paths logic and put it somewhere else.

@costinm wdyt?

Regarding why in master, we plan to make it work E2E first and only after ensuring it works, then merge with release branch.

Also does this mean we will no longer have a global auth setting and setup client side auth by “default”?

This is correct... We need to provide something global else after these work...

@rshriram
Copy link
Member

rshriram commented May 4, 2018

My point is retain the isDestinationEnabledForMtls. In that function, check global flag or local dest rule that comes as part of the plugin call. Add an extra parameter to plugin calls in cluster.go to supply the destination rule as well.

This way, there is very minimal code change

@@ -334,47 +325,4 @@ func (Plugin) OnInboundRouteConfiguration(in *plugin.InputParams, route *xdsapi.
// OnOutboundCluster implements the Plugin interface method.
func (Plugin) OnOutboundCluster(env model.Environment, node model.Proxy, service *model.Service,
servicePort *model.Port, cluster *xdsapi.Cluster) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this method signature to take the DestinationRule as well. Same goes for Inboundcluster. Then you can simply do if destRule != nil and destRule.trafficPolicy.TLSmode==istio_mutual then...
don't forget to extract the port specific traffic policy first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems most TLS related settings are in cluster.go - it may keep them grouped ( and maybe later move to a 'tls.go' plugin). The authentication plugin seems to deal mostly with the auth policies - while cluster with the network destination rules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rshriram could we proceed as it is right now, given other TLSettings mode is also handled in cluster.go? And later if needed extract client tls stuff out into tls.go plugin, suggested by @costinm

I'll send follow-up PR to actually build multi-filter chain + tls sniffing filter, which depends on this one to be checked-in.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But can you push this PR to 0.8 as well ? This is independent of any thing else.

@codecov
Copy link

codecov bot commented May 4, 2018

Codecov Report

Merging #5418 into master will increase coverage by 1%.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #5418     +/-   ##
========================================
+ Coverage      74%     74%     +1%     
========================================
  Files         323     317      -6     
  Lines       27394   26289   -1105     
========================================
- Hits        20123   19420    -703     
+ Misses       6480    6140    -340     
+ Partials      791     729     -62
Impacted Files Coverage Δ
...ilot/pkg/networking/plugin/authn/authentication.go 84% <ø> (+16%) ⬆️
pilot/pkg/networking/core/v1alpha3/cluster.go 0% <0%> (ø) ⬆️
mixer/adapter/servicecontrol/checkprocessor.go 78% <0%> (-1%) ⬇️
pilot/pkg/serviceregistry/kube/controller.go 67% <0%> (-1%) ⬇️
mixer/adapter/circonus/circonus.go 71% <0%> (ø) ⬇️
mixer/adapter/memquota/rollingWindow.go 100% <0%> (ø) ⬆️
mixer/pkg/protobuf/yaml/dynamic/builder.go 100% <0%> (ø) ⬆️
mixer/adapter/noop/noop.go 100% <0%> (ø) ⬆️
mixer/adapter/dogstatsd/dogstatsd.go 100% <0%> (ø) ⬆️
pilot/pkg/proxy/envoy/v2/discovery.go
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e2b90...ca83544. Read the comment docs.

@incfly
Copy link
Author

incfly commented May 4, 2018

  1. Global settings for mTLS is now preserved. I change how we buildDefaultTrafficPolicy.
  2. I'm not sure if we need to handle MtlsExcludedServices specially. In the context of v2 api, maybe customers should write DestinationRule for those services, like apiserver? @costinm

@rshriram
Copy link
Member

rshriram commented May 7, 2018

Also this has nothing to do with tls sniffing. This pr is useful by itself in existing 0.8, where auth overrides tls settings in dest rule. So I still believe that 0.8 is the right place for the ISTIO_mutual stuff. The tls snuffing changes in inbound listener is a different PR

In auth plugin
OnOutboundCluster(cluster, destination rule, service, port)
If isIstioMtlsEnabled()
Set istio mTLS

isIstioMtlsEnabled(mesh, destrule)
if mesh.mtls and destrulr.tls == nil
or destrule.tlsmode == istio_mutual
return true
Return false

@diemtvu
Copy link
Contributor

diemtvu commented May 7, 2018

Technically, the approach Shriram suggested could work. However, per discussion with Louis, Costin (meeting note), we want to cleanly separate authn policy to affect server side only, and destination rule to affect client side only. As a result, I think it's make more sense to remove OutboundCluster authn plugin code, to the applyTrafficPolicy module. + @louiscryan @costinm for more comments.

We need to infer a 'default' destination rules from the global mesh config flag for now, for backward compatible. However, in the long run, I think we can remove that flag and have customers explicitly submit global authN policy and destination rule for that.

@@ -359,6 +362,34 @@ func applyUpstreamTLSSettings(cluster *v2.Cluster, tls *networking.TLSSettings)
},
Sni: tls.Sni,
}
case networking.TLSSettings_ISTIO_MUTUAL:
// TODO(incfly): needs to change when SDS is ready.
cluster.TlsContext = &auth.UpstreamTlsContext{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the authN plugin function to a regular function to set these config and call it here. With that, the code that related to Istio-mTLS still stays closed to each other.

@@ -89,7 +91,8 @@ func (configgen *ConfigGeneratorImpl) buildOutboundClusters(env model.Environmen

// create default cluster
clusterName := model.BuildSubsetKey(model.TrafficDirectionOutbound, "", service.Hostname, port)
defaultCluster := buildDefaultCluster(env, clusterName, convertResolution(service.Resolution), hosts)
serviceAccounts := env.ServiceAccounts.GetIstioServiceAccounts(service.Hostname, []string{port.Name})
defaultCluster := buildDefaultCluster(env, clusterName, convertResolution(service.Resolution), hosts, serviceAccounts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate simpler implementation suggestion: when a destinationrule is obtained, pass it to a convertIstioMtlsToMtls function. This function will simply set the tls mode to mutual with the cert file names using istio certs and SAN field using the service accounts. Once converted, the rest of the code does not need to be changed at all. And if config is nil, create a new dest rule if mesh.mtls is true and set the mTLS settings in the dest rule. This way you don’t have to wire up tnt service accounts everywhere.
In future, if anyone were to add more cluster building code, they wouldn’t have to worry about plumbing service accounts everywhere. If you want, you could add this code directly into model.DestinationRules() such that you don’t even have to change cluster.go

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed remove the service account plumbing. although IMHO override/conversion is less straightforward to understand compared w/ handling everything in a single switch-case.

I'm fine with either one. I can implemented the conversion and remove service account from params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the conversion as it ensures that any other code using cluster.go functions does not end up overriding auth stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conversion is better done in cluster.go, instead of model code due to certain wildcard issues in destination rules. But please convert so that you don’t have to push the service account thing into all functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion. In fact, it's similar to authn policy: when there is no authn policy in CRD, we generate one based on config map settings. May be we can do the same here?

@incfly
Copy link
Author

incfly commented May 8, 2018

@rshriram We also think this is a useful change for 0.8, regardless of tls multiplexing. I'll work on testing this against 0.8 branch and if we can get test passing before release cut, will change this into 0.8.

@diemtvu
Copy link
Contributor

diemtvu commented May 8, 2018 via email

@rshriram
Copy link
Member

rshriram commented May 9, 2018

There is enough breaking change in 0.8 already. That said this also means that people have to use v1alpha3 in order to do auth. But v1alpha3 is optional in 0.8 and becomes default only in 0.9. So you need a backward compatible way of enabling fleet wide auth in 0.8 (via global config option)

@diemtvu
Copy link
Contributor

diemtvu commented May 9, 2018

with v1/v1apha1, authN policy will set both client and server side, so it still work without destination rules. Do you think that is enough?

@incfly
Copy link
Author

incfly commented May 9, 2018

I changed to fill key/cert path in advance, instead of plumbing service account everywhere. I still want to keep ISTIO_MUTUAL separate from MUTUAL because in future when SDS is available we can't use key/certs paths to configure envoy.

Also removed the change on buildDefaultCluster, which is used for user app container as cluster as well. And there, we shouldn't add any TLSSettings.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in good faith. Please fix the bugs I noted above (port traffic policy and mesh external/headless stuff).


// set TLSSettings if configmap global settings specifies MUTUAL_TLS.
if env.Mesh.AuthPolicy == meshconfig.MeshConfig_MUTUAL_TLS {
applyUpstreamTLSSettings(defaultCluster, buildIstioMutualTls(upstreamServiceAccounts))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: this cannot be applied for external services. Add a check for service.MeshExternal and skip setting up this istio mutual for default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don’t know how this will work for headless services (ie ones using original dst cluster). You have to enable the auth in headless service test and see. If it fails add a check for original dst cluster here as well, just like the plugin code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked external service now. Sorry for the lack of familiarity, what's the headless services/original dst cluster? is there some field to check, similar as service.MeshExternal?

if subset.TrafficPolicy.Tls.Mode == networking.TLSSettings_ISTIO_MUTUAL {
subset.TrafficPolicy.Tls = buildIstioMutualTls(upstreamServiceAccount)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough. We also have port traffic policy - which is port specific traffic policy. See code in this file for example b

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

However, this concerns me that we potentially use TrafficPolicy.TLSSettings in lots of places of DestinationRule, and it's cumbersome to keep the conversion in sync while we're adding other level (label, say), traffic policy override. Unless we do proto reflection to traverse over the DestinationRule and find all fields typed as TLSSettings...

defaultCluster := buildDefaultCluster(env, clusterName, convertResolution(service.Resolution), hosts)

// set TLSSettings if configmap global settings specifies MUTUAL_TLS.
if env.Mesh.AuthPolicy == meshconfig.MeshConfig_MUTUAL_TLS && !service.External() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think External() works for v1 only. MeshExternal maybe the check you need. + @rshriram to confirm

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I found that and changed to MeshExternal in the latest commit.

@googlebot
Copy link
Collaborator

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@googlebot googlebot added cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. and removed cla: yes labels May 9, 2018
@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no Set by the Google CLA bot to indicate the author of a PR has not signed the Google CLA. labels May 9, 2018
@istio-testing
Copy link
Collaborator

istio-testing commented May 9, 2018

@incfly: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-unit-tests.sh ca83544 link /test istio-unit-tests
prow/e2e-bookInfoTests-v1alpha3.sh ca83544 link /test e2e-bookInfo-envoyv2-v1alpha3
prow/istio-pilot-e2e-envoyv2-v1alpha3.sh ca83544 link /test istio-pilot-e2e-envoyv2-v1alpha3
prow/istio-pilot-e2e.sh ca83544 link /test istio-pilot-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@istio-testing
Copy link
Collaborator

@incfly: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 18, 2018
@PiotrSikora
Copy link
Contributor

Is (ab)using TLSMode for this is really a good idea? Shouldn't this a part of the TrafficPolicy? Perhaps bool TrafficPolicy.InMesh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR needs to be rebased before being merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants