-
Notifications
You must be signed in to change notification settings - Fork 214
fix(monitor): race condition between timer and context cancelation #2146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sthaha
merged 1 commit into
sustainable-computing-io:reboot
from
sthaha:fix-race-condition
Jun 12, 2025
Merged
fix(monitor): race condition between timer and context cancelation #2146
sthaha
merged 1 commit into
sustainable-computing-io:reboot
from
sthaha:fix-race-condition
Jun 12, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## reboot #2146 +/- ##
==========================================
+ Coverage 91.50% 91.52% +0.01%
==========================================
Files 36 36
Lines 3438 3444 +6
==========================================
+ Hits 3146 3152 +6
Misses 225 225
Partials 67 67 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
vprashar2929
previously approved these changes
Jun 12, 2025
b28e5e2
to
b5988d7
Compare
vimalk78
previously approved these changes
Jun 12, 2025
suggestion:
|
b5988d7
to
f9c7d57
Compare
This commit fixes a race condition when timer fires ⚡️ and context gets canceled 💥 at the same time. This race condition was detected the the TestCollectionCancellation and happens as follows ... ```` monitor_collection_test.go:201: Error Trace: .../monitor_collection_test.go:201 Error: Not equal: expected: time.Date(2025, time.June, 11, 8, 14, 45, 720156604, time.Local) actual : time.Date(2025, time.June, 11, 8, 14, 45, 732389967, time.Local) Test: TestCollectionCancellation Messages: No collections should happen after cancellation ```` Race Sequence -------------- 1. **Test Context**: `TestCollectionCancellation` runs with a 100ms timeout context 2. **Timer Setup**: Collection goroutines are scheduled with 10ms intervals 3. **Cancellation**: After ~50ms, test calls `cancel()` which triggers `pm.collectionCancel()` 4. **Run() Exit**: The `Run()` method exits and test captures `snapshotAfterRun.Timestamp` 5. **Race Window**: 💥 There's a brief window where: - Context is canceled (`pm.collectionCtx.Done()` becomes ready) - Timer also fires (`<-timer` becomes ready) - Go's `select` **randomly** chooses between the two ready channels 6. **Failure Case**: When `select` chooses `<-timer`: - `synchronizedPowerRefresh()` executes and updates snapshot with new timestamp - Test assertion fails because timestamps don't match Signed-off-by: Sunil Thaha <sthaha@redhat.com>
f9c7d57
to
99a271e
Compare
vimalk78
approved these changes
Jun 12, 2025
4d24510
into
sustainable-computing-io:reboot
16 of 17 checks passed
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
vimalk78
pushed a commit
to vimalk78/kepler
that referenced
this pull request
Jun 12, 2025
…-condition fix(monitor): race condition between timer and context cancelation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit fixes a race condition when timer fires ⚡️ and context gets canceled 💥 at the same time. This race condition was detected the the TestCollectionCancellation and happens as follows ...
Race Sequence
Test Context:
TestCollectionCancellation
runs with a 100ms timeout contextTimer Setup: Collection goroutines are scheduled with 10ms intervals
Cancellation: After ~50ms, test calls
cancel()
which triggerspm.collectionCancel()
Run() Exit: The
Run()
method exits and test capturessnapshotAfterRun.Timestamp
Race Window: 💥 There's a brief window where:
pm.collectionCtx.Done()
becomes ready)<-timer
becomes ready)select
randomly chooses between the two ready channelsFailure Case: When
select
chooses<-timer
:synchronizedPowerRefresh()
executes and updates snapshot with new timestampFixes #2144