Skip to content

Conversation

svghadi
Copy link
Contributor

@svghadi svghadi commented Dec 10, 2024

Related #16972 (comment)

This is a follow PR for #18660 to add observeAt field in application health status to indicate status freshness. The change was initial part of #18660 however was split up to unblock the other feature as this has some performance concerns(#18660 (comment)).

@blakepettersson will provide some performance data. For more context refer #18660 (comment) .

Changes:

  • Adds a new .status.health.observedAt field that is periodically updated (every 30s) to reflect the freshness of the observed state.
  • A new event handler is set up to update the observedAt field every 30s. It enqueues events into the appRefreshQueue with a comparison option that ensures only the status is refreshed without performing a resource-level comparison.

The screen grab below shows that observedAt is updated every 30s without affecting the lastTransitionTime field.

Screen.Recording.2024-09-12.at.6.53.06.PM.mov

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com>
@svghadi svghadi requested a review from a team as a code owner December 10, 2024 15:45
Copy link

bunnyshell bot commented Dec 10, 2024

❗ Preview Environment undeploy from Bunnyshell failed

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to try again to remove the environment

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 69.23077% with 8 lines in your changes missing coverage. Please review.

Project coverage is 55.22%. Comparing base (2f51067) to head (05e16f0).

Files with missing lines Patch % Lines
controller/appcontroller.go 66.66% 5 Missing and 2 partials ⚠️
controller/state.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21120      +/-   ##
==========================================
+ Coverage   55.20%   55.22%   +0.02%     
==========================================
  Files         324      324              
  Lines       55596    55619      +23     
==========================================
+ Hits        30689    30713      +24     
  Misses      22285    22285              
+ Partials     2622     2621       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mubarak-j
Copy link
Contributor

mubarak-j commented Dec 13, 2024

@svghadi will this feature help with this request argoproj/notifications-engine#99 ?
the idea is to avoid false alerts/spam-like notifications when applications' health status flaps between healthy and degraded/unknown.

I'm wondering if observedAt would be helpful in only sending notifications if e.g. the app has been degraded state in last 5 mins?

trigger.on-health-degraded: |
  - description: Application has degraded
    oncePer: app.status.operationState.syncResult.revision
    send:
    - app-health-degraded
    when: app.status.health.status == 'Degraded' and time.Now().Sub(time.Parse(app.status.health.observedAt)).Minutes() >= 5

Edit: reading the previous PR #18660, looks like lastTransitionTime is what I will need, because observedAt is constantly refreshed.

}

// We are interested only in refreshing the status, not performing a resource-level comparison.
ctrl.requestAppRefresh(newApp.QualifiedName(), ComparisonWithNothing.Pointer(), nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I have concerns about triggering a refresh every 30 seconds for each app, this may add up.

Copy link
Contributor

@andrii-korotkov-verkada andrii-korotkov-verkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to be reworked to update on each regular refresh and/or refresh triggered by webhook or something else, but it shouldn't trigger new refreshes due to perf reasons. We may need another field refreshedAt if not present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants