Skip to content

Conversation

inteon
Copy link
Member

@inteon inteon commented Jan 31, 2024

#6676 did not cover all error cases.
This PR fixes an additional error type reported by @weisdd, we did not solve the problem for the azidentity.AuthenticationFailedError errors (#5452 (comment)).
Additionally, I tried to simplify the stabilizeError function and added new tests for this new error type.

Kind

/kind bug

Release Note

NONE

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@jetstack-bot jetstack-bot added kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/acme Indicates a PR directly modifies the ACME Issuer code area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2024
@inteon inteon requested a review from wallrj January 31, 2024 09:11
@inteon inteon mentioned this pull request Jan 31, 2024
73 tasks
@inteon
Copy link
Member Author

inteon commented Jan 31, 2024

/cherry-pick release-1.14

@jetstack-bot
Copy link
Contributor

@inteon: once the present PR merges, I will cherry-pick it on top of release-1.14 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@weisdd weisdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested 3 scenarios:

  • No federated identity credential;
  • Non-existent client id;
  • No "DNS Zone Contributor" role assigned.

In all 3 cases, the behaviour for "Challenge" resource was stable meaning no early reconciliations. So, I think we should be good to go now :)

@wallrj
Copy link
Member

wallrj commented Jan 31, 2024

Thanks @weisdd

I also tested this branch by partially following the AKS tutorial: https://cert-manager.io/docs/tutorials/getting-started-aks-letsencrypt/#create-a-clusterissuer-for-lets-encrypt-staging

Observed a Certificate being issued successfully.

Deleted the federated identity:

az identity federated-credential delete   --name "cert-manager"   --identity-name "${USER_ASSIGNED_IDENTITY_NAME}"

Then renewed the certificate and observed it fail and the error message on the challenge was redacted

$ cmctl status certificate www
...
Order:
  Name: www-3-3709051279
  State: pending, Reason:
  Authorizations:
    URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/10919711684, Identifier: www.azure-tutorial.richard-gcp.jetstacker.net, Initial State: pending, Wildcard: false
Challenges:
- Name: www-3-3709051279-258319704, Type: DNS-01, Token: 5Ds--K8YlZdaV45LksFsOIax3zEV9PyAQqyrvfnGiu0, Key: ifqfcVbgsNIP0wIjaieVy_yYOT0cLMkoYrDWpdnM8ZM, State: pending, Reason: WorkloadIdentityCredential authentication failed
POST https://login.microsoftonline.com/f99bd6a4-665c-41cf-aff1-87a89d5c62d4/oauth2/v2.0/token
--------------------------------------------------------------------------------
RESPONSE 401 Unauthorized
--------------------------------------------------------------------------------
<REDACTED>
--------------------------------------------------------------------------------
To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#workload, Processing: true, Presented: false

Unfortunately, the redacted error on the Challenge is not sufficient for the user to know what the problem is.
It is missing this crucial ERROR_DESCRIPTION which you can only see in the logged errors.

"error_description": "AADSTS700211: No matching federated identity record found for presented assertion issuer 'https://eastus2.oic.prod-aks.azure.com/f99bd6a4-665c-41cf-aff1-87a89d5c62d4/cddd5247-a9c7-4ee4-be19-f64d5a994dd7/'. Please check your federated identity credential Subject, Audience and Issuer against the presented assertion. https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation Trace ID: be215318-cd3a-467f-b0a1-30569c3bb000 Correlation ID: 6d2634ea-fd27-4b6f-aedd-8adef7a84f57 Timestamp: 2024-01-31 11:19:14Z",

The cert-manager logs contain both the redacted and the unredacted errors, which is annoying, but I suppose that was always the case:

$ kubectl -n cert-manager  logs deployments/cert-manager --follow
...

 > logger="cert-manager.challenges" key="default/www-3-3709051279-258319704"
E0131 11:19:14.993095       1 azuredns.go:189] "Error creating TXT" err=<
        WorkloadIdentityCredential authentication failed
        POST https://login.microsoftonline.com/f99bd6a4-665c-41cf-aff1-87a89d5c62d4/oauth2/v2.0/token
        --------------------------------------------------------------------------------
        RESPONSE 401 Unauthorized
        --------------------------------------------------------------------------------
        {
          "error": "invalid_client",
          "error_description": "AADSTS700211: No matching federated identity record found for presented assertion issuer 'https://eastus2.oic.prod-aks.azure.com/f99bd6a4-665c-41cf-aff1-87a89d5c62d4/cddd5247-a9c7-4ee4-be19-f64d5a994dd7/'. Please check your federated identity credential Subject, Audience and Issuer against the presented assertion. https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation Trace ID: be215318-cd3a-467f-b0a1-30569c3bb000 Correlation ID: 6d2634ea-fd27-4b6f-aedd-8adef7a84f57 Timestamp: 2024-01-31 11:19:14Z",
          "error_codes": [
            700211
          ],
          "timestamp": "2024-01-31 11:19:14Z",
          "trace_id": "be215318-cd3a-467f-b0a1-30569c3bb000",
          "correlation_id": "6d2634ea-fd27-4b6f-aedd-8adef7a84f57"
        }
        --------------------------------------------------------------------------------
        To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#workload
 > logger="cert-manager.azure-dns" zone="azure-tutorial.richard-gcp.jetstacker.net" domain="_acme-challenge.www.azure-tutorial.richard-gcp.jetstacker.net." resource group="jetstack-dev"
E0131 11:19:14.993300       1 controller.go:167] "re-queuing item due to error processing" err=<
        WorkloadIdentityCredential authentication failed
        POST https://login.microsoftonline.com/f99bd6a4-665c-41cf-aff1-87a89d5c62d4/oauth2/v2.0/token
        --------------------------------------------------------------------------------
        RESPONSE 401 Unauthorized
        --------------------------------------------------------------------------------
        <REDACTED>
        --------------------------------------------------------------------------------
        To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#workload
 > logger="cert-manager.challenges" key="default/www-3-3709051279-258319704"

For the sake of releasing 1.14 on time, I will approve this now.
But I suggest someone should try and improve the redaction function in a follow up PR.

/lgtm
/approve

@jetstack-bot jetstack-bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 31, 2024
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wallrj, weisdd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@inteon
Copy link
Member Author

inteon commented Jan 31, 2024

/retest

@jetstack-bot
Copy link
Contributor

@inteon: new pull request created: #6688

In response to this:

/cherry-pick release-1.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme Indicates a PR directly modifies the ACME Issuer code dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants