Skip to content

Conversation

jshr-w
Copy link
Contributor

@jshr-w jshr-w commented Oct 2, 2024

This PR adds preliminary changes necessary to fully address #32820 & the corresponding CFP design-cfps/#57.

Specifically, this PR:

  • Changes the package being used for the ICMP pinger.
  • Reconfigures the HTTP prober and ICMP pinger timeouts to 10s.
  • Rate-limits the ICMP pinger and HTTP prober within the currently fixed probing interval.

Other changes to implement the CFP will be made in separate future PRs.

connectivity health checking: improve the reliability of health checking at large scales by rate-limiting probes

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 2, 2024
@jshr-w jshr-w force-pushed the jshr/healthproberate branch 2 times, most recently from 62cdc77 to 0c6278d Compare October 3, 2024 02:45
@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 3, 2024

/test

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Oct 3, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 3, 2024
@jshr-w jshr-w force-pushed the jshr/healthproberate branch 3 times, most recently from 23238dd to 97b901b Compare October 7, 2024 01:41
@jshr-w jshr-w marked this pull request as ready for review October 7, 2024 01:41
@jshr-w jshr-w requested review from a team as code owners October 7, 2024 01:41
@jshr-w jshr-w requested a review from marseel October 7, 2024 01:41
@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 7, 2024

/test

@jshr-w jshr-w marked this pull request as draft October 7, 2024 03:36
@jshr-w jshr-w force-pushed the jshr/healthproberate branch 3 times, most recently from cc0c3a5 to 0de7611 Compare October 8, 2024 02:31
@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 8, 2024

/test

@jshr-w jshr-w marked this pull request as ready for review October 8, 2024 03:43
@jshr-w jshr-w force-pushed the jshr/healthproberate branch 2 times, most recently from 4114845 to 103a873 Compare October 8, 2024 23:51
@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 9, 2024

/test

@marseel
Copy link
Contributor

marseel commented Oct 9, 2024

Thanks for the PR @jshr-w !

I will review it early next week, knowing how health-checking is subtle I will need to pay extra attention 😅

@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 9, 2024

Sounds good to me, thanks @marseel! For sure this one has a lot of cases to think about 😅

Copy link
Contributor

@marseel marseel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jshr-w this looks great.
It's so much easier to understand the whole logic now.
I left a few small nits to further simplify this part of the code.
Once you address them, I will do a final pass of review and experiment manually with a new implementation to check it at a larger scale and also from correctness point of view.
Thanks!

@jshr-w jshr-w force-pushed the jshr/healthproberate branch 2 times, most recently from 3423214 to ca94aa3 Compare October 16, 2024 02:48
@jshr-w
Copy link
Contributor Author

jshr-w commented Oct 16, 2024

/test

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I didn't look at the code changes in details. Doc changes are trivial and look good; I have a few nits below, although not blocking.

@jshr-w jshr-w force-pushed the jshr/healthproberate branch 3 times, most recently from bf09eea to 7330cc8 Compare November 7, 2024 23:13
@jshr-w
Copy link
Contributor Author

jshr-w commented Nov 8, 2024

/test

1 similar comment
@jshr-w
Copy link
Contributor Author

jshr-w commented Nov 8, 2024

/test

@pchaigno pchaigno enabled auto-merge November 8, 2024 08:23
@pchaigno pchaigno removed request for a team and brlbil November 8, 2024 08:25
@jshr-w
Copy link
Contributor Author

jshr-w commented Nov 8, 2024

/ci-eks

auto-merge was automatically disabled November 11, 2024 00:19

Head branch was pushed to by a user without write access

@jshr-w jshr-w force-pushed the jshr/healthproberate branch 2 times, most recently from 7ab3bb7 to 8cce006 Compare November 11, 2024 00:19
@jshr-w
Copy link
Contributor Author

jshr-w commented Nov 11, 2024

/test

@pchaigno pchaigno requested review from michi-covalent and removed request for derailed November 12, 2024 10:26
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎏

@michi-covalent michi-covalent added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Nov 12, 2024
@jshr-w jshr-w force-pushed the jshr/healthproberate branch from 8cce006 to bf2e181 Compare November 13, 2024 00:00
This commit changes the package for the health package ICMP pinger to
the prometheus/pro-bing library, anticipating changes for CFP-32820.
Probe timeouts are set to 10s for both HTTP and ICMP probers to prevent
timing out too quickly or waiting too long.

Signed-off-by: jshr-w <shjayaraman@microsoft.com>
This commit spreads the health probes across the probing interval. The
rate limit is set to (# of IPs to ping / probing interval) probes per
second for both the ICMP and HTTP probes.

Signed-off-by: jshr-w <shjayaraman@microsoft.com>
@jshr-w jshr-w force-pushed the jshr/healthproberate branch from bf2e181 to d3fe129 Compare November 13, 2024 00:03
@jshr-w
Copy link
Contributor Author

jshr-w commented Nov 13, 2024

/test

@michi-covalent michi-covalent removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Nov 13, 2024
@michi-covalent michi-covalent added this pull request to the merge queue Nov 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 13, 2024
Merged via the queue into cilium:main with commit 282dcb7 Nov 13, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health Relates to the cilium-health component ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants