Skip to content

Conversation

itaiatu
Copy link
Contributor

@itaiatu itaiatu commented Jul 30, 2024

In order to check if the AWSManagedControlPlane object is healthy, I wrote this custom check in Lua, as explained in the documentation.

This custom health check is needed because when experimented a bit with this resource, saw that the .status.ready field becomes true a long time before the resource was created and every dependency was provisioned.

Example of a condition that was False for some minutes even if the .status.ready was true

...
status:
  conditions:
    ...
    - lastTransitionTime: "2024-07-25T09:22:46Z"
      message: "failed reconciling OIDC provider for cluster: failed to create OIDC provider: error creating provider: LimitExceeded: Cannot exceed quota for OpenIdConnectProvidersPerAccount: 100\n\tstatus code: 409, request id: e6f1bfdd-7a41-4685-8c2d-a081c511f3d8"
      reason: EKSControlPlaneReconciliationFailed
      severity: Error
      status: "False"
    ...
    ready: true
    ...

Example of an AWSManagedControlPlane status field

...
status:
  conditions:
    - lastTransitionTime: "2024-07-25T09:40:21Z"
      status: "True"
      type: Ready
    - lastTransitionTime: "2024-07-25T09:14:12Z"
      status: "True"
      type: ClusterSecurityGroupsReady
    - lastTransitionTime: "2024-07-25T09:40:17Z"
      status: "True"
      type: EKSAddonsConfigured
    - lastTransitionTime: "2024-07-25T09:22:19Z"
      reason: created
      severity: Info
      status: "False"
      type: EKSControlPlaneCreating
    - lastTransitionTime: "2024-07-25T09:40:19Z"
      status: "True"
      type: EKSControlPlaneReady
    - lastTransitionTime: "2024-07-25T09:40:19Z"
      status: "True"
      type: EKSIdentityProviderConfigured
    - lastTransitionTime: "2024-07-25T09:40:21Z"
      status: "True"
      type: IAMAuthenticatorConfigured
    - lastTransitionTime: "2024-07-25T09:14:13Z"
      status: "True"
      type: IAMControlPlaneRolesReady
    - lastTransitionTime: "2024-07-25T09:14:11Z"
      status: "True"
      type: SubnetsReady
    - lastTransitionTime: "2024-07-25T09:14:10Z"
      status: "True"
      type: VpcReady
  ...
  ready: true
  ...

Thus, the basic flow is the following:

  • No status or no conditions -> Still in progress
  • Iterate over all conditions
    • If the current condition is Ready and its current status is True, then the control plane is ready
    • If we find a condition with False status and Error severity, then we append the err msg to an err msgs accumulator to return at the end and report the Degraded status
  • .status.ready == false -> Is not ready
  • If didn't encounter any errors and the Ready condition is not True, it means that the creation is still in progress, so we report the Progressing status to ArgoCD.

Now, in terms of testing the health check in a real environment, we performed a full CAPA cluster build. Below are some screenshots during the testing process.

  1. First the .status.ready was False
step1
  1. After some time, .status.ready became True, but OIDC reached the 100 limit (err messages are in the Health Details)
step2
  1. Then, I deleted an old sbx cluster and an OIDC became available. Immediately, after the OIDC assignment, the chart synced.
step3

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Copy link

bunnyshell bot commented Jul 30, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link

bunnyshell bot commented Jul 30, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

itaiatu and others added 2 commits July 30, 2024 17:51
Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
…goproj#19295)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.55.3 to 1.55.4.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.55.3...v1.55.4)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
@itaiatu itaiatu force-pushed the awsmanagedcontrolplane-custom-health-check branch from e11af00 to 76bd765 Compare July 30, 2024 14:51
@itaiatu itaiatu changed the title Add custom health check for AWSManagedControlPlane feat: Add custom health check for AWSManagedControlPlane Jul 30, 2024
Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
@itaiatu itaiatu changed the title feat: Add custom health check for AWSManagedControlPlane feat: Add custom health check for cluster-api AWSManagedControlPlane Jul 30, 2024
Copy link

codecov bot commented Jul 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.79%. Comparing base (0894ddc) to head (7110122).
Report is 514 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19304      +/-   ##
==========================================
- Coverage   52.79%   52.79%   -0.01%     
==========================================
  Files         316      316              
  Lines       43582    43582              
==========================================
- Hits        23010    23009       -1     
- Misses      18018    18024       +6     
+ Partials     2554     2549       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@itaiatu itaiatu marked this pull request as ready for review July 31, 2024 11:04
@itaiatu itaiatu requested review from a team as code owners July 31, 2024 11:04
Copy link

@JonnieDoe JonnieDoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

👍

Copy link
Member

@34fathombelow 34fathombelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your contribution. @crenshaw-dev any thoughts of cherry-picking this into v2.12?

@34fathombelow
Copy link
Member

/cherry-pick release-2.12

@alexmt alexmt merged commit f110570 into argoproj:master Aug 2, 2024
28 of 29 checks passed
gcp-cherry-pick-bot bot pushed a commit that referenced this pull request Aug 2, 2024
…19304)

* add custom health check for awsmanagedcontrolplane

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

* chore(deps): bump github.com/aws/aws-sdk-go from 1.55.3 to 1.55.4 (#19295)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.55.3 to 1.55.4.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.55.3...v1.55.4)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

* add new line at the end of health_test.yaml file

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

---------

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
alexmt pushed a commit that referenced this pull request Aug 2, 2024
…19304) (#19360)

* add custom health check for awsmanagedcontrolplane



* chore(deps): bump github.com/aws/aws-sdk-go from 1.55.3 to 1.55.4 (#19295)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.55.3 to 1.55.4.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.55.3...v1.55.4)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...





* add new line at the end of health_test.yaml file



---------

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: itaiatu <140485521+itaiatu@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
rhyswilliamsza pushed a commit to rhyswilliamsza/argo-cd that referenced this pull request Aug 12, 2024
…rgoproj#19304)

* add custom health check for awsmanagedcontrolplane

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

* chore(deps): bump github.com/aws/aws-sdk-go from 1.55.3 to 1.55.4 (argoproj#19295)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.55.3 to 1.55.4.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.55.3...v1.55.4)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

* add new line at the end of health_test.yaml file

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>

---------

Signed-off-by: Iulian Taiatu <itaiatu@adobe.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Rhys Williams <rhys.williams@electrum.co.za>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants