-
Notifications
You must be signed in to change notification settings - Fork 274
fix(certs): update checkAndRotate to use current durations #4800
Conversation
1e833a9
to
76ad808
Compare
Codecov Report
@@ Coverage Diff @@
## main #4800 +/- ##
==========================================
- Coverage 69.47% 69.41% -0.07%
==========================================
Files 218 218
Lines 15766 15819 +53
==========================================
+ Hits 10954 10980 +26
- Misses 4760 4787 +27
Partials 52 52
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
39990d4
to
378ce46
Compare
pkg/certificate/manager.go
Outdated
continue | ||
} | ||
if newCert != cert { | ||
// Certificate was rotated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there's a race condition here where IssueCertificate could have triggered a rotation between the call to list, and the check here.
One thought is we can add the logic to issuecertificate for messaging on a rotation if we need to rotate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved publishing to the cert pub-sub to IssueCertificate
to ensure a message is published whenever a certificate is rotated (including outside of checkAndRotate
).
pkg/certificate/manager.go
Outdated
cert = cert.newMergedWithRoot(validatingIssuer.CertificateAuthority) | ||
} | ||
validityDuration := m.getValidityDurationForCertType(ct) | ||
cert, err = signingIssuer.IssueCertificate(cn, validityDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with my above comment, there's still a race condition if issue gets called twice in a row.
singleflight is a package meant to fix exactly that issue, so we may want to leverage that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated IssueCertificate so all calls to issue certificates are included in group.Do
.
0730a64
to
637fd7a
Compare
@@ -162,6 +165,22 @@ func (c *Client) GetServiceCertValidityPeriod() time.Duration { | |||
return validityDuration | |||
} | |||
|
|||
// GetIngressGatewayCertValidityPeriod returns the validity duration for ingress gateway certificates, and a default in case of unspecified or invalid duration | |||
func (c *Client) GetIngressGatewayCertValidityPeriod() time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case to use a default if unspecified? I'd rather make this field required in the CRD spec vs assuming user intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If GetIngressGatewaCertValidityPeriod() is somehow called when the ingressGateway field isn't specified should it return an error? Currently, If the ingressGateway ValidityDuration field is updated and the duration is invalid the ingressGatewayCert won't be updated. On rotation of the ingressGateway cert we'll attempt to get the duration from the meshConfig, but it will be invalid. Do you think it makes sense to return a default duration in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an ingressGateway is specified in the MeshConfig then the duration is required. We could specify a default instead of making it required? I guess ideally we would validate the MeshConfig and prevent invalid durations before they are applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a required property, why do we need a default? Is there a default to simplify the code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ingressGateway field is removed/unset, the behavior as implemented in pkg/ingress/gateway.go is to call ReleaseCertificate(). So rotation should not happen for such a scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Additionally,
On rotation of the ingressGateway cert we'll attempt to get the duration from the meshConfig, but it will be invalid.
I am wondering if we ever want to use a default during rotation for the ingress cert. I don't see a need to rotate the ingress gateway cert when the state of the gateway cert spec is deemed invalid. Would it be easier to prevent a misconfig?
This is not a blocking comment. I can understand if there are a few corner cases being guarded against through defensive programming here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I would imagine, if the updated value is invalid, we should continue to use the older value encoded in the previous cert and not a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ingress gateway cert's validity duration is set in the meshConfig to an invalid duration the cert will not be reissued but the invalid duration will remain set in the meshConfig. When the ingress gateway cert expires should the previous validity duration be used? Prior to my change, the service cert validity duration would have been used for the ingress gateway cert on expiration. I could add a validity duration field to the Certificate
struct or I could add validation for the MeshConfig, but I think that is beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shashankram Could this be addressed in a follow-up PR? I think this could be addressed when we add validation for the MeshConfig. I am also happy to make the changes to include the validity duration as a field in Certificate
or make those changes in a follow-up PR as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure. This comment is not blocking in nature.
e84a5fe
to
85e1c60
Compare
} | ||
|
||
// WithValidityPeriod modifies the expiration on a newly issued certificate. | ||
func WithValidityPeriod(validityPeriod time.Duration) IssueOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steeling I removed this because I didn't want to make the cert type/ validity duration optional and assume a default. With my current design for cert rotation and issuing it is required that I have a cert type or something similar to determine whether to call getServiceCertValidityDuration, getIngressGatewayCertValidityDuration, or use the OSM internal cert duration. Do you have any suggestions for how to incorporate the validity period options with my changes or take a different approach? Thanks!!
pkg/certificate/manager.go
Outdated
@@ -15,10 +17,13 @@ import ( | |||
"github.com/openservicemesh/osm/pkg/messaging" | |||
) | |||
|
|||
var group singleflight.Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd consider scoping this to the certmanager, in case there are ever multiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think I might leave this as-is for this PR unless this is a blocking comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it’s a bug not to, since if 2 Cert manager objects are created the cache would not be populated correctly.
It’s a 1-2 line fix, so I would push for this in this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
func (m *Manager) issueCertificate(prefix string, ct CertType, opts ...IssueOption) (*Certificate, error) { | ||
var rotate bool | ||
cert := m.getFromCache(prefix) // Don't call this while holding the lock | ||
if cert != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put the nil check in should rotate to reduce nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the nesting the call to shouldRotate
. It makes sense to only call shouldRotate
when the cert is not nil. Happy to make the change if you think moving the nil check to shouldRotate
improves readability.
Updates the checkAndRotate method to update certs with the current validity duration specified in the MeshConfig for their respective cert type. Adds a certType to the certificate struct that is used to lookup validity duration when calling IssuingCertificates. Updates shouldRotate to check if the cert needs to be rotated due to an in progress root certificate rotation. GetFromCache will return nil if the cert is not found in the cache, expiring, or needs to be rotated because of a root cert rotation. Signed-off-by: jaellio <jaellio@microsoft.com>
Signed-off-by: jaellio <jaellio@microsoft.com>
Signed-off-by: jaellio <jaellio@microsoft.com>
Description:
Updates the
checkAndRotate
method to update certs with the currentvalidity duration specified in the MeshConfig for their respective
cert type. Adds a
certType
to the certificate struct that is usedto lookup validity duration when calling
IssuingCertificate
.Updates
shouldRotate
to check if the cert needs to be rotated dueto an in-progress root certificate rotation.
GetFromCache
willreturn nil if the cert is not found in the cache, expiring, or
needs to be rotated because of a root cert rotation.
Resolves #4810
Testing done:
Affected area:
Please answer the following questions with yes/no.
Does this change contain code from or inspired by another project? no
Is this a breaking change? no
Has documentation corresponding to this change been updated in the osm-docs repo (if applicable)?