-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix AWS Route53 error detection for not-found errors during deletion #7548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes cert-manager#7547 Signed-off-by: alex-berger <alex-berger@gmx.ch>
Hi @alex-berger. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc |
@alex-berger: GitHub didn't allow me to request PR reviews from the following users: alex-berger. Note that only cert-manager members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be nice if we could also add a unit test for that error path as well, but even without it this change makes sense to me!
/ok-to-test |
WIP: I've been testing this change by following the AWS tutorial, with cert-manager deployed from the master branch and then again when deployed from this branch. I observed the bug before and then, after switching to this branch and increasing the log verbosity, I could see the new improved behaviour. # values.yaml
crds:
enabled: true
global:
logLevel: 4 export KO_REGISTRY=ttl.sh/$(uuidgen)
export KO_HELM_VALUES_FILES=$PWD/values.yaml
make -j4 ko-deploy-certmanager Before:
After:
|
// If we try to delete something and get a 'InvalidChangeBatch' that | ||
// means it's already deleted, no need to consider it an error. | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alex-berger
I tested this change yesterday and yes, this does work and apologies for causing this bug with my careless code review remarks. I have learned that errors.Is
should only be used with sentinel errors.
The AWS-SDK-GO V2 docs say this about error handling:
Service operations can return modeled error types to indicate specific errors. These modeled types can be used with errors.As to unwrap and determine if the operation failure was due to a specific error.
In this case, there is a modelled error, route53types.InvalidChangeBatch
, which we can use with errors.As
,
so I suggest we use that instead of unwrapping the underlying smithy error and string matching "InvalidChangeBatch".
This seems to be the code where the modelled error is created:
- https://github.com/aws/aws-sdk-go-v2/blob/main/service/route53/internal/customizations/doc.go#L8-L37
- https://github.com/aws/aws-sdk-go-v2/blob/ef40d0c7f18167e851e939d28357a541102a6a94/service/route53/internal/customizations/custom_error_deser.go#L76
It would be nice to have a unit-test for this. It looks like it should be possible to inject a fake route53 client with a ChangeResourceRecordSets
and a canned error response.
If you have time, please add a test, else let me know and I'll try and write one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I first tried something like this:
var invalidChangeBatch route53types.InvalidChangeBatch
if errors.As(err, &invalidChangeBatch) {
...
}
But, it did not work (errors.As(err, &invalidChangeBatch)
always evaluated to false
). In the end I ended up with the current solution, which actually seems to work:
var apiErr smithy.APIError
if errors.As(err, &apiErr) {
if apiErr.ErrorCode() == "InvalidChangeBatch" && action == route53types.ChangeActionDelete {
...
}
}
However, I am neither an expert in Go nor in AWS SDK for Go, so there might well be a better approach to this :-).
It would be nice to have a unit-test for this. It looks like it should be possible to inject a fake route53 client with a ChangeResourceRecordSets and a canned error response.
If you have time, please add a test, else let me know and I'll try and write one.
I agree, having a unit test would be great. However, mocking route53 client is not that easy and we would also have to make sure we return the exact same error (type). And still, if they change error handling again in the AWS SDK, the test will be pretty useless, as we test against our mock (which will not reflect the change in the SDK). So, the test would still succeed, while at runtime it will fail again.
Those days, I have not much spare capacity. So, unless we have a better idea for unit testing (one that really tests against the AWS SDK behavior and not against a mock), I will not invest more time for now. Of course, if you have some spare time, be my guest :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I first tried something like this:
var invalidChangeBatch route53types.InvalidChangeBatch if errors.As(err, &invalidChangeBatch) { ... }
I think it might be because the returned error in this case is a pointer, so you need to do:
var invalidChangeBatch *route53types.InvalidChangeBatch
I'll play around with it and push to your branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wallrj Any news? If not can we merge this PR and consider addressing potential improvements to error handling with a future PR. I would really love to see this fix to land in the next cert-manager release, to make sure automatic certificate renewal works seamless (without human intervention).
@wallrj any feedback/update on #7548 (comment)? |
And use the modelled error type, as recommended by the AWS SDK documentation Signed-off-by: Richard Wall <richard.wall@venafi.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: euank The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I've started adding some unit-tests for this, but not quite finished and had to make some changes to the test server and to the retry code, to get timely failures....
|
Signed-off-by: Richard Wall <richard.wall@venafi.com>
@alex-berger I pushed too many unrelated changes to your branch and then wasn't able to force-push to remove them....so I moved everything to another PR which has now been merged: Sorry for the slow turn around. |
Pull Request Motivation
Fixes #7547
Kind
/kind bug
Release Note