Skip to content

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Aug 15, 2024

I #7102 (comment), @ealogar reports the folowing error when using the Route53 DNS-01 solver with the AssumeRoleWithWebIdentity authentication method:

│ E0808 21:29:29.746771 1 controller.go:162] "re-queuing item due to error processing" err="failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to resolve service endpoint, endpoint rule │
│ error, Invalid Configuration: Missing Region" logger="cert-manager.controller" key="xxxxxxxxxxxxxxx"

The message "failed to refresh cached credentials" makes me wonder whether the problem might be that there's a delay in the following sequence during which the STS token expires:

  1. re-queuing item due to error processing
  2. route53:NewDNSProvider
  3. route53:provider.GetSession
  4. stsSvc.AssumeRoleWithWebIdentity
  5. route53:Present
  6. route53:changeRecord: failed to determine Route 53 hosted zone ID
  7. route53:getHostedZoneID: util.FindZoneByFqdn (takes longer than X seconds during which STS token has expired)
  8. route53:getHostedZoneID: operation error Route 53: ListHostedZonesByName
  9. internal/auth/smithy/credentials_adapter.go#L42: get credentials
  10. aws/credential_cache.go#L128: failed to refresh cached credentials

Fixes: #7108

NONE

Signed-off-by: Richard Wall <richard.wall@venafi.com>
@cert-manager-prow
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@cert-manager-prow cert-manager-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/acme Indicates a PR directly modifies the ACME Issuer code labels Aug 15, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from wallrj. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code needs-kind Indicates a PR lacks a `kind/foo` label and requires one. area/api Indicates a PR directly modifies the 'pkg/apis' directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 15, 2024
@wallrj wallrj closed this Aug 16, 2024
@wallrj wallrj deleted the 7102-route53-sts-credential-provider branch August 16, 2024 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme Indicates a PR directly modifies the ACME Issuer code area/api Indicates a PR directly modifies the 'pkg/apis' directory dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant